Continuous Delivery Workshop with Neal Ford (@neal4d) – a Retrospective

I attended a workshop at DevWeek 2012 led by Neal Ford (@neal4d) on Continuous Delivery (CD). The day was excellent – Neal is a really engaging presenter – and I took copious notes, even though I’d already read most of the CD book. Fifteen months later, I thought it would be interesting to see how my notes from Neal’s workshop compared with my experience of Continuous Delivery, both within my job at thetrainline.com, and also in conversations with other people, particularly the good folks in the London Continuous Delivery meetup group.

The tl;dr version: go attend one of Neal’s excellent CD workshops, but be prepared for the challenges with Continuous Delivery to be much more social/organisational than technical.

I have arranged this post into sections corresponding with the notes I took, which follow the order of the workshop. In each section, I list Neal’s recommendations and advice first, then comment on specific aspects. The themes were:
  • Infrastructure-as-Code
  • The Deployment Pipeline
  • Testing Strategy
  • Source Code Management
  • Test-Driven Development (TDD)
  • Teams
  • Components
  • Releases
  • Databases
  • Environments and Infrastructure
  • Common Objections to Continuous Delivery
Some of the notes might be a bit out of context – I suggest attending one of Neal’s workshops to get the full benefit :)

CD workshop with Neal Ford

Introduction to Continuous Delivery

  1. Do not use arbitrary release dates; instead, give the business the power to release when features are ready.
  2. See software systems as the whole stack, including databases and infrastructure, not just the software.
  3. Use Puppet or Chef for building machine images – this frees up Ops people for embedding in software teams, where they can deliver the most value.
  4. More rapid releases allow you to adjust your product strategy more rapidly; users can guide product development. Example of Flickr – started as an online game, but product team noticed that users were making use of the advanced photo upload feature for sharing photos. Because the Flickr release cycle was short, they could respond rapidly to user feedback (explicit and  implicit (via monitoring)) and innovate accordingly.
  5. Users dislike big changes, so keep the changes small!
  6. How long does it take you to get a one-line code change into Production? – Mary Poppendieck
  7. The ‘Agile Manifesto’ actually contains the phrase Continuous Delivery: “…early and continuous delivery of valuable software.
  8. Definitions: Lead Time – from idea to working in Production. Cycle Time: from starting work on an idea to working in Production.
  9. Ideals of CD: Software is always production-ready; deployments are reliable and repeatable; anyone/everyone can self-service deployments; releases happen according to business needs.
The ‘tracer bullet’ through organisational intertia and legacy procedures here is the question from Poppendieck, because not only should you be able to release a bug fix through your CD pipelines within a day (or hour, or minute) but you should be measuring this too.
Giving the business the power and decision to release is not an easy place to reach, but aiming for that will help other goals be achieved.
In July 2013 and beyond, we might look at multi-node alternatives to Chef and Puppet like Salt and Ansible, but the principles remain the same.

Infrastructure-as-Code

  1. Any change in {Code, Infrastructure, Configuration} should trigger the creation of a Production-ready deployment package.
  2. Treat computer images as ‘compiled output’ (just as we do software).
  3. The definition of ‘Done’ == ‘Released to Production and working
The ‘lightbulb moment’ for me when Neal was talking was when he talked about servers being ‘compiled’ – you would never seriously consider patching a DLL or Jar file using a Hex editor (you’d rebuild from source code), so why should you make manual changes to a server, when you can… rebuild from source code?

Deployment Pipeline

  1. Value stream mapping – visualise the release process.
  2. The deployment pipeline – communicating a huge amount of information
Since the workshop I have really come to appreciate the importance of visualising effectively the release process. Everyone should be able to see the progress of a new code commit on the way to the Production servers. At thetrainline.com we use ThoughtWorks GO 13.2, which has some nifty pipeline visualisation features which I am looking forward to exploiting over the coming weeks. I have also been talking to several people about how best to visualise deployment progress, and we have a #londoncd meetup group session on pipeline visualisation coming in November 2013 in London

Testing Strategy

  1. Build servers from bare metal each time, using Chef or Puppet.
  2. Puppet for Windows is now available [March 2012]. No excuse now for not building from source, whether running on Linux/Mac or Windows.
  3. Know the purpose of your automated tests and keep them separate. cf. Brian Marick‘s work
  4. Look for opportunities to pair up Devs with Ops and QA people.
  5. Aim to build vertical slices of software rather than horizontal layers
  6. Acceptance criteria need to be defined as a computer-parsable script e.g. Cucumber and other DSLs for Behaviour-Driven Development (BDD) like Rspec, Mspec, Nspec?
  7. The QA person/role becomes a key arbitrator of quality and aspects of design right theway through the project.
  8. Books: Specification by Example; Growing Object-Oriented Software, Guided by Tests (‘GOOS’);
The key points for me here are that a fresh approach to testing/QA is needed. In fact, ditch the idea of “quality assurance” as an activity that happens after development, and change to building quality into the software from the very start. This means that the testing people should have a huge say in how the software gets built and what tests are run. Also, tests are expensive: if you don’t know why a test is running, either find out, or remove the test. Oh, and buy a copy of GOOS!

Source Code Management

  1. “Maintain a single source repository”
  2. Branches: feature branches BAD. Release branches OKAY (if they are short-lived). Experimental branches (GOOD).
  3. Commit to trunk/master every day: how to avoid extra stuff getting out too early? RELEASE WHEN FEATURES ARE READY!!! Can also use feature toggles – remove the toggles when the feature is released. Also use branch by abstraction – programming to interfaces.
There should be only a single, logical source of truth for code; that is, you might have many Git repos, but only one ‘origin’ or defiinitive repo per project anywhere in your organisation. Avoid code duplication across sibling repos like the plague.

TDD

  1. The cyclomatic complexity for TDD code usually much lower than for non-TDD code. TDD code is better quality and more maintainable. TDD can be Test-Driven Design i.e. it is more than just way to write code, but also a way to design parts of the system more effectively.
  2. Neal Ford recommends to test private methods as well as public methods
  3. TDD should not be done in order to establish business requirements, but instead as a technical engineering practice. If you’re using TDD to help establish business requirements, then something else is missing.
  4. Value of TDD backed up by studies ‘in the wild’ of teams at Microsoft by Laurie Williams at NCSU – writing more code allows you to go faster and maintain a higher velocity than with non-TDD code.
It’s bewildering how many people still do not understand the purpose of TDD, and seem to be challenged and even scared by it. TDD is the only effective way to produce supportable, evolvable software which is not immediately legacy. The GOOS view is useful: that tests act as a frame around which to ‘grow’ your code. The research is there to prove it – TDD works. No excuses!

Teams

  1. Conway’s Law is very powerful.
  2. 2-pizza team – how many people can you feed with 2 pizzas? 8 or 10. Teams should be no bigger than this.
  3. Each time can be responsible for several components, but should be working on only one at a time. Limit WIP.
The ‘raw materials’ for a successful software system are not heaps of code, but cohesive and effective teams. Ignore Conway’s Lay if you like, but it WILL prevail; you can use it to your advantage or your detriment. The value of a well-performing team is massive, but this is not a trivial thing to build and maintain; teams need real ownership over code, and that means the team takes decisions about that code and exclusively works on that code.

Components

  1. Directed Acyclical Graph – if your subsystem dependencies do not fall into this pattern, then you need to revisit the relationships.
  2. Use tools such as NDepend or CDA to analyse dependencies.
  3. CI for components: each component has its own build and pipeline.
  4. Static vs Fluid vs Dynamic dependenciesguarded dependencies for when newer versions do not work.
  5. “Manual steps in software systems is like ‘building a snowflake'”
In retrospect, the one thing missing from my notes from this section (possibly also the workshop) is some detail on package management, and how a decent package manager (like Gems for Ruby, NuGet for .NET, RPM for Linux packages), which resolves dependencies recursively, can really help to solve component problems. In fact, since the workshop I have become a strong advocate of using package management tooling not just for build-time dependencies, but in other scenarios too: collecting a ‘platform’ of applications together, and also for capturing runtime dependencies such as remote web service calls (that’s a future blog post…).

Releases

  1. “Just another deployment”
  2. Blue-Green: testing in the production environment
  3. Canary releases: direct a small number of “canary” users to a different version. If the ‘canary’ keeps ‘singing’ then you can roll more users onto the new feature.
  4. Dark Launch features ahead of when they will be visible. Very suitable for infra and DB changes.
  5. Emergency Fixes are no different from normal releases.
If your emergency fixes are not going through a calibrarted and well-exercised deployment pipeline, then those ‘fixes’ are likely to be riskier than necessary. A point not mentioned by Neal iirc (to be fair, he covered a huge amount of material) was some of the lower-level distinctions between delivery and distribution which Alex P covered at QCon London 2013, which I have found very helpful when thinking about deployments: a deployment is simply flipping a symlink!

Databases

  1. Databases are often fossilised – too ‘precious’ to be interfered with.
  2. The ‘expand/contract’ pattern decouples db migration from app migration (upgrade)
  3. dbdeploy – really just a name for a good practice – db updates as code (small deltas). Crucially, metadata about the db version (e.g. which scripts have been run) is in the db itself. Commercial tools such as Liquibase allow vendor-independence.
  4. To achieve CI for db changes, use a baselne snapshot, and apply deltas serially. If a delta fails, stop the build (FAIL FAST). db delta scripts are stored in source code control alongside the related other code, so they are available to the CI system.
  5. The key thing here is that db changes (as scripts) are verified as part of the CI build.
  6. A good delta is 1 or 2 lines of SQL
  7. The expand/contract pattern allows decoupling of db chnges from application changes. Version 34 of the database might support versions 5, 6 and 7 of the applicaton. Extra columns/tables/functions are accumulated (expansion) and then after a certain application release, support for 5 & 6 is removed (contraction).
  8. Often, changes to the database can be ‘dark launched’ before any application code uses the features in production.
  9. See Refactoring Databases by Scott W Ambler
  10. Make use of DEPRECATION_COMMENT data type (or equivalent) when planning db changes so that warnings show up in db logs.
The expand/contract pattern fits with what Stefan Tilkov said at QCon 2012 about the pace of changes in different parts of the system. Dark launching soon-to-be-used DB features certainly needs a bit more up-front coordination and maturity, but it’s worth it for the ability to release the features when they are ready, rather than having to wait for a risky live DB upgrade.
I cannot find any online references to the DEPRECATION_COMMENT data type or similar, however.

Environments and Infrastructure

  1. Ops folk are tired of the pain of fire-fighting crappy apps ‘thrown over the wall’
  2. Adopt DevOps features: devs to carry pagers, ops present in Inceptions, Showcases, Retrospectives,
  3. See Release It! by Michael Nygard from Pragmatic Programmer series
  4. “Destroy works of art” – throw away things which cannot be built again exactly (from source).
  5. Monitoring scripts are part of the ‘system’ so should be part of the deployment pipeline.
  6. ANY change to the system should go via the pipeline – technology which does not fit this requirement should be dropped. This is a big change!
  7. Visibility of non-co-located team members: use cheap web-cams to produce a sense of co-location! Acts as a virtual window into the other office; can also be used within the same building, if this is large.
  8. Cross-functional teams + ‘guerilla teams‘ which can move from project to project, helping with specific technologies (pipelines, Puppet, …) i.e. ‘enabling’ work.
  9. Key things for CD process: VISIBILITY + viral (JFDI)
  10. What is the definition of ‘done’? (Should be “In Production and being used”).
  11. Are you doing proper iterative development? Iterations should be 4 weeks max, ideally 2 weeks.
  12. Are you doing proper Scrum? Project Managers should ONLY: Deflect distractions (and provide nice snacks)
The book Release It! is one of the most crucial books for software developers to read and understand, in my opinion; highly recommended. Devs being on-call and Ops people attending development meetings are both absolutely fundamental practices for modern, rapid software development; all sorts of painful issues have to be solved to get this to happen effectively, but having solved those problems, the organisation will be stronger, wiser, and ready to deliver continuously.

Objections to Continuous Delivery

  1. ITIL/SOX etc. do not naturally sit well with CD. However, CD can actually make things MORE compliant and serviceable:
    1. Visibility and control over locking down
    2. Automation over documentation
    3. Auditing
  2. Make it easy to back out changes: by automating processes.
Since the workshop, there have been some really useful discussions online around the concept of ‘anti-fragile’ and how – when applied well – ITIL and DevOps (and by extension CD) can both work towards the same goal. Since formerly process-heavy companies like SAP and HP have transformed their software delivery, I suspect that other organisations will find ways to make the case that rapid, repeatable, reliable, and recurring software delivery makes for better compliance and transparency, not worse.

What’s Next?

Continuous Delivery bookNeal’s workshop was hugely useful in cramming in a great deal of information, experience and advice into a short space of time. What neither Neal nor the Humble & Farley CD book really mention(ed) is the extent to which the technical challenges listed above, although non-trivial, are actually much less of an obstacle to Continuous Delivery than departmental politics, legacy procedures and thinking, and Capex-Opex battles, all of which are really social-organisational problems.

Steve Smith (@AgileSteveSmith) has given an excellent talk on the social-organisational challenges of introducing Continuous Delivery (first at London Continuous Delivery meetup group, and then at QCon New York) – I really recommend this talk to anyone who wants a flavour of the social challenges to CD!

Other sessions we’ve had at London Continuous Delivery meetup group which touch on the team/organisational aspects of CD include Team Transformation for CD at 7digital with Chris O’Dell (@ChrisAnnODell) and James Betteley (@jamesbetteley) on Measuring Progress with CD (kudos to SkillsMatter for the meeting space).

The team at Nokia Entertainment has had some notable success with Continuous Delivery and DevOps. Their agile team lead John Clapham (johnC_bristol) has helped to transform their software delivery from painful, infrequent releases to pain-free 30-minute deployments. Go and hear John speak about the team transformation at Nokia if you can.

Finally, buy the Continuous Delivery book, attend a ThoughtWorks CD workshop (Paul Stack @stack72 also runs an excellent CI and CD workshop), and (if you’re close to London) come to the #londoncd meetup group :)

5 responses to “Continuous Delivery Workshop with Neal Ford (@neal4d) – a Retrospective

  1. Devs have lives too that need to be respected and taken into consideration when determining who is “on-call” when responding to P1 Incidents. Being expected to work long nights and weekends on a fairly regular basis is not sustainable and will result in rapid burnout.

    • I agree. The goal is to give Devs enough context and first-hand experience of incidents to provide them the context for more operable software. Poorly-implemented on-call rotas are a disaster for morale and defeat the purpose of having Devs on call, which is to improve the code.

  2. Pingback: A Smattering of Selenium #155 | Official Selenium Blog·

Join the discussion...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s