I attended a workshop at DevWeek 2012 led by Neal Ford (@neal4d) on Continuous Delivery (CD). The day was excellent – Neal is a really engaging presenter – and I took copious notes, even though I’d already read most of the CD book. Fifteen months later, I thought it would be interesting to see how my notes from Neal’s workshop compared with my experience of Continuous Delivery, both within my job at thetrainline.com, and also in conversations with other people, particularly the good folks in the London Continuous Delivery meetup group.

The tl;dr version: go attend one of Neal’s excellent CD workshops, but be prepared for the challenges with Continuous Delivery to be much more social/organisational than technical.

I have arranged this post into sections corresponding with the notes I took, which follow the order of the workshop. In each section, I list Neal’s recommendations and advice first, then comment on specific aspects. The themes were:

Infrastructure-as-Code
The Deployment Pipeline
Testing Strategy
Source Code Management
Test-Driven Development (TDD)
Teams
Components
Releases
Databases
Environments and Infrastructure
Common Objections to Continuous Delivery

Some of the notes might be a bit out of context – I suggest attending one of Neal’s workshops to get the full benefit 🙂

Introduction to Continuous Delivery

Do not use arbitrary release dates; instead, give the business the power to release when features are ready.
See software systems as the whole stack, including databases and infrastructure, not just the software.
Use Puppet or Chef for building machine images – this frees up Ops people for embedding in software teams, where they can deliver the most value.
More rapid releases allow you to adjust your product strategy more rapidly; users can guide product development. Example of Flickr – started as an online game, but product team noticed that users were making use of the advanced photo upload feature for sharing photos. Because the Flickr release cycle was short, they could respond rapidly to user feedback (explicit and implicit (via monitoring)) and innovate accordingly.
Users dislike big changes, so keep the changes small!
How long does it take you to get a one-line code change into Production? – Mary Poppendieck
The ‘Agile Manifesto’ actually contains the phrase Continuous Delivery: “…early and continuous delivery of valuable software.“
Definitions: Lead Time – from idea to working in Production. Cycle Time: from starting work on an idea to working in Production.
Ideals of CD: Software is always production-ready; deployments are reliable and repeatable; anyone/everyone can self-service deployments; releases happen according to business needs.

The ‘tracer bullet’ through organisational intertia and legacy procedures here is the question from Poppendieck, because not only should you be able to release a bug fix through your CD pipelines within a day (or hour, or minute) but you should be measuring this too.

Giving the business the power and decision to release is not an easy place to reach, but aiming for that will help other goals be achieved.

In July 2013 and beyond, we might look at multi-node alternatives to Chef and Puppet like Salt and Ansible, but the principles remain the same.

Infrastructure-as-Code

Any change in {Code, Infrastructure, Configuration} should trigger the creation of a Production-ready deployment package.
Treat computer images as ‘compiled output’ (just as we do software).
The definition of ‘Done’ == ‘Released to Production and working‘

The ‘lightbulb moment’ for me when Neal was talking was when he talked about servers being ‘compiled’ – you would never seriously consider patching a DLL or Jar file using a Hex editor (you’d rebuild from source code), so why should you make manual changes to a server, when you can… rebuild from source code?

Deployment Pipeline

Value stream mapping – visualise the release process.
The deployment pipeline – communicating a huge amount of information

Since the workshop I have really come to appreciate the importance of visualising effectively the release process. Everyone should be able to see the progress of a new code commit on the way to the Production servers. At thetrainline.com we use ThoughtWorks GO 13.2, which has some nifty pipeline visualisation features which I am looking forward to exploiting over the coming weeks. I have also been talking to several people about how best to visualise deployment progress, and we have a #londoncd meetup group session on pipeline visualisation coming in November 2013 in London.

Testing Strategy

Build servers from bare metal each time, using Chef or Puppet.
Puppet for Windows is now available [March 2012]. No excuse now for not building from source, whether running on Linux/Mac or Windows.
Know the purpose of your automated tests and keep them separate. cf. Brian Marick‘s work
Look for opportunities to pair up Devs with Ops and QA people.
Aim to build vertical slices of software rather than horizontal layers
Acceptance criteria need to be defined as a computer-parsable script e.g. Cucumber and other DSLs for Behaviour-Driven Development (BDD) like Rspec, Mspec, Nspec?
The QA person/role becomes a key arbitrator of quality and aspects of design right theway through the project.
Books: Specification by Example; Growing Object-Oriented Software, Guided by Tests (‘GOOS’);

The key points for me here are that a fresh approach to testing/QA is needed. In fact, ditch the idea of “quality assurance” as an activity that happens after development, and change to building quality into the software from the very start. This means that the testing people should have a huge say in how the software gets built and what tests are run. Also, tests are expensive: if you don’t know why a test is running, either find out, or remove the test. Oh, and buy a copy of GOOS!

Source Code Management

“Maintain a single source repository”
Branches: feature branches BAD. Release branches OKAY (if they are short-lived). Experimental branches (GOOD).
Commit to trunk/master every day: how to avoid extra stuff getting out too early? RELEASE WHEN FEATURES ARE READY!!! Can also use feature toggles – remove the toggles when the feature is released. Also use branch by abstraction – programming to interfaces.

There should be only a single, logical source of truth for code; that is, you might have many Git repos, but only one ‘origin’ or defiinitive repo per project anywhere in your organisation. Avoid code duplication across sibling repos like the plague.

TDD

The cyclomatic complexity for TDD code usually much lower than for non-TDD code. TDD code is better quality and more maintainable. TDD can be Test-Driven Design i.e. it is more than just way to write code, but also a way to design parts of the system more effectively.
Neal Ford recommends to test private methods as well as public methods
TDD should not be done in order to establish business requirements, but instead as a technical engineering practice. If you’re using TDD to help establish business requirements, then something else is missing.
Value of TDD backed up by studies ‘in the wild’ of teams at Microsoft by Laurie Williams at NCSU – writing more code allows you to go faster and maintain a higher velocity than with non-TDD code.

It’s bewildering how many people still do not understand the purpose of TDD, and seem to be challenged and even scared by it. TDD is the only effective way to produce supportable, evolvable software which is not immediately legacy. The GOOS view is useful: that tests act as a frame around which to ‘grow’ your code. The research is there to prove it – TDD works. No excuses!

Teams

Conway’s Law is very powerful.
2-pizza team – how many people can you feed with 2 pizzas? 8 or 10. Teams should be no bigger than this.
Each time can be responsible for several components, but should be working on only one at a time. Limit WIP.

The ‘raw materials’ for a successful software system are not heaps of code, but cohesive and effective teams. Ignore Conway’s Lay if you like, but it WILL prevail; you can use it to your advantage or your detriment. The value of a well-performing team is massive, but this is not a trivial thing to build and maintain; teams need real ownership over code, and that means the team takes decisions about that code and exclusively works on that code.

Components

Directed Acyclical Graph – if your subsystem dependencies do not fall into this pattern, then you need to revisit the relationships.
Use tools such as NDepend or CDA to analyse dependencies.
CI for components: each component has its own build and pipeline.
Static vs Fluid vs Dynamic dependencies – guarded dependencies for when newer versions do not work.
“Manual steps in software systems is like ‘building a snowflake'”

In retrospect, the one thing missing from my notes from this section (possibly also the workshop) is some detail on package management, and how a decent package manager (like Gems for Ruby, NuGet for .NET, RPM for Linux packages), which resolves dependencies recursively, can really help to solve component problems. In fact, since the workshop I have become a strong advocate of using package management tooling not just for build-time dependencies, but in other scenarios too: collecting a ‘platform’ of applications together, and also for capturing runtime dependencies such as remote web service calls (that’s a future blog post…).

Releases

“Just another deployment”
Blue-Green: testing in the production environment
Canary releases: direct a small number of “canary” users to a different version. If the ‘canary’ keeps ‘singing’ then you can roll more users onto the new feature.
Dark Launch features ahead of when they will be visible. Very suitable for infra and DB changes.
Emergency Fixes are no different from normal releases.

If your emergency fixes are not going through a calibrarted and well-exercised deployment pipeline, then those ‘fixes’ are likely to be riskier than necessary. A point not mentioned by Neal iirc (to be fair, he covered a huge amount of material) was some of the lower-level distinctions between delivery and distribution which Alex P covered at QCon London 2013, which I have found very helpful when thinking about deployments: a deployment is simply flipping a symlink!

Databases

Databases are often fossilised – too ‘precious’ to be interfered with.
The ‘expand/contract’ pattern decouples db migration from app migration (upgrade)
dbdeploy – really just a name for a good practice – db updates as code (small deltas). Crucially, metadata about the db version (e.g. which scripts have been run) is in the db itself. Commercial tools such as Liquibase allow vendor-independence.
To achieve CI for db changes, use a baselne snapshot, and apply deltas serially. If a delta fails, stop the build (FAIL FAST). db delta scripts are stored in source code control alongside the related other code, so they are available to the CI system.
The key thing here is that db changes (as scripts) are verified as part of the CI build.
A good delta is 1 or 2 lines of SQL
The expand/contract pattern allows decoupling of db chnges from application changes. Version 34 of the database might support versions 5, 6 and 7 of the applicaton. Extra columns/tables/functions are accumulated (expansion) and then after a certain application release, support for 5 & 6 is removed (contraction).
Often, changes to the database can be ‘dark launched’ before any application code uses the features in production.
See Refactoring Databases by Scott W Ambler
Make use of DEPRECATION_COMMENT data type (or equivalent) when planning db changes so that warnings show up in db logs.

The expand/contract pattern fits with what Stefan Tilkov said at QCon 2012 about the pace of changes in different parts of the system. Dark launching soon-to-be-used DB features certainly needs a bit more up-front coordination and maturity, but it’s worth it for the ability to release the features when they are ready, rather than having to wait for a risky live DB upgrade.

I cannot find any online references to the DEPRECATION_COMMENT data type or similar, however.

Environments and Infrastructure

Ops folk are tired of the pain of fire-fighting crappy apps ‘thrown over the wall’
Adopt DevOps features: devs to carry pagers, ops present in Inceptions, Showcases, Retrospectives,
See Release It! by Michael Nygard from Pragmatic Programmer series
“Destroy works of art” – throw away things which cannot be built again exactly (from source).
Monitoring scripts are part of the ‘system’ so should be part of the deployment pipeline.
ANY change to the system should go via the pipeline – technology which does not fit this requirement should be dropped. This is a big change!
Visibility of non-co-located team members: use cheap web-cams to produce a sense of co-location! Acts as a virtual window into the other office; can also be used within the same building, if this is large.
Cross-functional teams + ‘guerilla teams‘ which can move from project to project, helping with specific technologies (pipelines, Puppet, …) i.e. ‘enabling’ work.
Key things for CD process: VISIBILITY + viral (JFDI)
What is the definition of ‘done’? (Should be “In Production and being used”).
Are you doing proper iterative development? Iterations should be 4 weeks max, ideally 2 weeks.
Are you doing proper Scrum? Project Managers should ONLY: Deflect distractions (and provide nice snacks)

The book Release It! is one of the most crucial books for software developers to read and understand, in my opinion; highly recommended. Devs being on-call and Ops people attending development meetings are both absolutely fundamental practices for modern, rapid software development; all sorts of painful issues have to be solved to get this to happen effectively, but having solved those problems, the organisation will be stronger, wiser, and ready to deliver continuously.

Objections to Continuous Delivery

ITIL/SOX etc. do not naturally sit well with CD. However, CD can actually make things MORE compliant and serviceable:
1. Visibility and control over locking down
2. Automation over documentation
3. Auditing
Make it easy to back out changes: by automating processes.

Since the workshop, there have been some really useful discussions online around the concept of ‘anti-fragile’ and how – when applied well – ITIL and DevOps (and by extension CD) can both work towards the same goal. Since formerly process-heavy companies like SAP and HP have transformed their software delivery, I suspect that other organisations will find ways to make the case that rapid, repeatable, reliable, and recurring software delivery makes for better compliance and transparency, not worse.

What’s Next?

Neal’s workshop was hugely useful in cramming in a great deal of information, experience and advice into a short space of time. What neither Neal nor the Humble & Farley CD book really mention(ed) is the extent to which the technical challenges listed above, although non-trivial, are actually much less of an obstacle to Continuous Delivery than departmental politics, legacy procedures and thinking, and Capex-Opex battles, all of which are really social-organisational problems.

Steve Smith (@AgileSteveSmith) has given an excellent talk on the social-organisational challenges of introducing Continuous Delivery (first at London Continuous Delivery meetup group, and then at QCon New York) – I really recommend this talk to anyone who wants a flavour of the social challenges to CD!

Other sessions we’ve had at London Continuous Delivery meetup group which touch on the team/organisational aspects of CD include Team Transformation for CD at 7digital with Chris O’Dell (@ChrisAnnODell) and James Betteley (@jamesbetteley) on Measuring Progress with CD (kudos to SkillsMatter for the meeting space).

The team at Nokia Entertainment has had some notable success with Continuous Delivery and DevOps. Their agile team lead John Clapham (johnC_bristol) has helped to transform their software delivery from painful, infrequent releases to pain-free 30-minute deployments. Go and hear John speak about the team transformation at Nokia if you can.

Finally, buy the Continuous Delivery book, attend a ThoughtWorks CD workshop (Paul Stack @stack72 also runs an excellent CI and CD workshop), and (if you’re close to London) come to the #londoncd meetup group 🙂

5 thoughts on “Continuous Delivery Workshop with Neal Ford (@neal4d) – a Retrospective”

A Software Engineer says:

July 23, 2013 at 18:15

Devs have lives too that need to be respected and taken into consideration when determining who is “on-call” when responding to P1 Incidents. Being expected to work long nights and weekends on a fairly regular basis is not sustainable and will result in rapid burnout.

1. Matthew Skelton (@matthewpskelton) says:
  
  July 23, 2013 at 20:52
  
  I agree. The goal is to give Devs enough context and first-hand experience of incidents to provide them the context for more operable software. Poorly-implemented on-call rotas are a disaster for morale and defeat the purpose of having Devs on call, which is to improve the code.
  
Niek Bartholomeus says:

July 26, 2013 at 14:44

Hi Matthew,

Nice write-up with lots of resources to dig in to. I’m already looking forward to your blog post on capturing runtime dependencies!

Niek.

1. Matthew Skelton (@matthewpskelton) says:
  
  August 8, 2013 at 15:44
  
  Thanks Niek. I was talking to some more of the #londoncd folks last night about using ‘ghost packages’ for capturing runtime dependencies – I’ll try to get my thoughts in blog form some time in September! M
  
Pingback: A Smattering of Selenium #155 | Official Selenium Blog