What Team Structure is Right for DevOps to Flourish?

The primary goal of any DevOps setup within an organisation is to improve the delivery of value for customers and the business, not in itself to reduce costs, increase automation, or drive everything from configuration management; this means that different organisations might need different team structures in order for effective Dev and Ops collaboration to take place.

So what team structure is right for DevOps to flourish? Clearly, there is no magic conformation or team topology which will suit every organisation. However, it is useful to characterise a small number of different models for team structures, some of which suit certain organisations better than others. By exploring the strengths and weaknesses of these team structures (or ‘topologies’), we can identify the team structure which might work best for DevOps practices in our own organisations, taking into account Conway’s Law.

Most of these DevOps topologies have been described elsewhere before; in particular, Lawrence Sweeney of CollabNet goes into useful detail in a comment on Ben Kepes’s blog about what I characterise here as Anti-Type B (DevOps Silo), Type 3 (IaaS), and Type 1 (Smooth Integration). The DevOpsGuys have a list of Twelve DevOps Anti-Patterns, and Jez Humble, Gene Kim, Damon Edwards (and many others) have said similar things. I have added here three additional ‘topologies’ which I’ve not seen or heard discussed much (Fully Embedded, DevOps-as-a-Service, and Temporary DevOps Team).

Shameless plug: I co-facilitate a hands-on workshop called Experience DevOps which expands on some of the themes in this post. We have forthcoming sessions in London (29 October 2013) and Amsterdam (15 November 2013).

DevOps Anti-Types

First of all, it’s useful to look at some bad practices, what we might call ‘anti-types’ (after the ubiquitous ‘anti-pattern‘).

Anti-Type A: Separate Silos

This is the classic ‘throw it over the wall’ split between Dev and Ops. It means that story points can be claimed early (DONE means ‘feature-complete’, but not working in Production), and software operability suffers because Devs do not have enough context for operational features and Ops folk do not have time or inclination to engage Devs in order to fix the problems before the software goes live.

DevOps Anti-Type A - Separate Silos

We likely all know this topology is bad, but I think there are actually worse topologies; at least with Anti-Type A (Separate Silos), we know there is a problem.

Anti-Type B: Separate DevOps Silo

The DevOps Silo (Anti-Type B) typically results from a manager or exec deciding that they “need a bit of this DevOps thing” and starting a ‘DevOps team’ (probably full of people known as ‘a DevOp‘). The members of the DevOps team quickly form another silo, keeping Dev and Ops further apart than ever as they defend their corner, skills, and toolset from the ‘clueless Devs’ and ‘dinosaur Ops’ people.

DevOps Anti-Type B - The DevOps Silo

The only situation where a separate DevOps silo really makes sense is when the team is temporary, lasting less than (say) 12 or 18 months, with the express purpose of bringing Dev and Ops closer together, and with a clear mandate to make the DevOps team superfluous after that time; this becomes what I have called a Type 5 DevOps Topology (below).

Anti-Type C: “We do not need Ops”

This topology is borne of a combination of naivety and arrogance from developers and development managers, particularly when starting on new projects or systems. Assuming that Ops is now a thing of the past (“we have the Cloud now, right?”), the developers wildly underestimate the complexity and importance of operational skills and activities, and believe that they can do without them, or just cover them in spare hours.

DevOps Anti-Type C - "We Don't Need Ops"

Such an Anti-Type C DevOps topology will probably end up needing either a Type 3 (IaaS) or a Type 4 DevOps topology (DevOps-as-a-Service) when their software becomes more involved and operational activities start to swamp ‘development’ (aka coding) time. If only such teams recognised the importance of Operations as a discipline as important and valuable as software development, they would be able to avoid much pain and unnecessary (and quite basic) operational mistakes.

DevOps Team Topologies

Having seen what makes the anti-types bad, we can look at some topologies in which DevOps can be made to work.

Type 1: Smooth Collaboration

This is the ‘promised land’ of DevOps: smooth collaboration between Dev teams and Ops teams, each specialising where needed, but also sharing where needed. There are likely many separate Dev teams, each working on a separate or semi-separate product stack.

Type 1 DevOps - Smooth Collaboration

My sense is that the Type 1 Smooth Collaboration model needs quite substantial organisational change to establish it, and a good degree of competence higher up in the technical management team. Dev and Ops must have a clearly expressed and demonstrably effective shared goal (‘Delivering Reliable, Frequent Changes’, or whatever). Ops folk must be comfortable pairing with Devs and get to grips with test-driven coding and Git, and Devs must take operational features seriously and seek out Ops people for input into logging implementations, and so on, all of which needs quite a culture change from the recent past.

Type 1 suitability: an organisation with strong technical leadership.
Potential effectiveness: HIGH

Type 2: Fully Embedded

Where operations people have been fully embedded within product development teams, we see a Type 2 topology. There is so little separation between Dev and Ops that all people are highly focused on a shared purpose; this is arguable a form of Type 1, but it has some special features.

Type 2 DevOps - Fully Embedded

Organisations such as Netflix and Facebook with effectively a single web-based product have achieved this Type 2 Fully Embedded topology, but I think it’s probably not hugely applicable outside a narrow product focus, because the budgetary constraints and context-switching typically present in an organisation with multiple product streams will probably force Dev and Ops further apart (say, back to a Type 1 model). The Fully Embedded topology might also be called ‘NoOps‘, as there is no distinct or visible Operations team (although the Netflix NoOps might also be Type 3, IaaS).

Type 2 suitability: organisations with a single main web-based product or service.
Potential effectiveness: HIGH

Type 3: Infrastructure-as-a-Service

For organisations with a fairly traditional IT Operations department which cannot or will not change rapidly [enough], and for organisations who run all their applications in the public cloud (Amazon EC2, Rackspace, Azure, etc.), it probably helps to treat Operations as a team who simply provide the elastic infrastructure on which applications are deployed and run; the internal Ops team is thus directly equivalent to Amazon EC2, or Infrastructure-as-a-Service.

Type 3 DevOps - Infrastructure-as-a-Service

A team (perhaps a virtual team) within Dev then acts as a source of expertise about operational features, metrics, monitoring, server provisioning, etc., and probably does most of the communication with the IaaS team. This team is still a Dev team, however, following standard practices like TDD, CI, iterative development, coaching, etc.

The IaaS topology trades some potential effectiveness (losing direct collaboration with Ops people) for easier implementation, possibly deriving value more quickly than by trying for Type 1 (Smooth Collaboration) which could be attempted at a later date.

Type 3 suitability: organisations with several different products and services, with a traditional Ops department, or whose applications run entirely in the public cloud.
Potential effectiveness: MEDIUM

Type 4: DevOps-as-a-Service

Some organisations, particularly smaller ones, might not have the finances, experience, or staff to take a lead on the operational aspects of the software they produce. The Dev team might then reach out to a service provider like Rackspace to help them build test environments and automate their infrastructure and monitoring, and advise them on the kinds of operational features to implement during the software development cycles.

Type 4 DevOps - DevOps-as-a-Service

What might be called DevOps-as-a-Service could be a useful and pragmatic way for a small organisation or team to learn about automation, monitoring, and configuration management, and then perhaps move towards a Type 3 (IaaS) or even  Type 1 (Smooth Collaboration) model as they grow and take on more staff with operational focus.

Type 4 suitability: smaller teams or organisations with limited experience of operational issues.
Potential effectiveness: MEDIUM

Type 5: Temporary DevOps Team

The Temporary DevOps Team (Type 5) looks substantially like Anti-Type B (DevOps Silo), but its intent and longevity are quite different. The temporary team has a mission to bring Dev and Ops closer together, ideally towards a Type 1 or Type 2 model, and eventually make itself obsolete.

Type 5 DevOps - Temporary DevOps Team

The members of the temporary team will ‘translate’ between Dev-speak and Ops-speak, introducing crazy ideas like stand-ups and Kanban for Ops teams, and thinking about dirty details like load-balancers, management NICs, and SSL offloading for Dev teams. If enough people start to see the value of bringing Dev and Ops together, then the temporary team has a real chance of achieving its aim; crucially, long-term responsibility for deployments and production diagnostics should not be given to the temporary team, otherwise it is likely to become a DevOps Silo (Anti-Type B).

Type 5 suitability: as a precursor to Type 1 topology, but beware the danger of Anti-Type B.
Potential effectiveness: LOW to HIGH

Summary

Exactly which DevOps team structure or topology will suit an organisation depends on several things:

  1. The product set of the organisation: fewer products make for easier collaboration, as there will be fewer natural silos, as predicted by Conway’s Law.
  2. The extent, strength, and effectiveness of technical leadership; whether Dev and Ops have a shared goal.
  3. Whether an organisation has the capability or appetite to change its IT Operations department from ‘racking hardware’ and ‘configuring servers’ to real alignment with the value stream, and for operational features to be taken seriously by software teams.
  4. Whether the organisation has the capacity or skills to take the lead on operational concerns.

Of course, there are variations on the themes outlined here; the topologies and types are meant as a reference guide or heuristic for assessing which patterns might be appropriate. In reality, a combination of more than one pattern, or one pattern transforming into another, will often be the best approach.

Acknowledgements

Thanks to @owainperry, @kief, @agilestevesmith, @TomAkehurst, @jamesbetteley, @johnC_bristol, and #londoncd members for helpful discussions about these ideas.

11 thoughts on “What Team Structure is Right for DevOps to Flourish?

  1. I think that Type 3 is dangerously close to an anti-type as well. Where your suitability says “traditional ops team”, this really is a description of “gnarly old unix neckbeards, who refuse to do anything other than an old version of perl”.

    If the ops folk have even the slightest desire to make things better, and embrace the devops culture, rather than being the previously mentioned neckbeard stereotypes, then having a team within the dev team, essentially “doing all the cool shit”, seriously runs the risk of alienating the ops team even more, and driving a a larger wedge between the two silos.

    • Hi Mat, thanks for the comments.

      I agree with you that Type 3 is not an ideal end-state for organisations with an internal Ops capability, but for some organisations it might be a reasonable short-term or medium term approach, particularly if the ‘Ops’ people are *really* not interested in learning Git or pairing with Devs, etc. These places do exist, sadly.

      For organisations with all their stuff in ‘the cloud’, I think Type 3 works reasonably well (and there isn’t really an alternative for them). What do you think?

      (See you at #infracoders :) )

      Matthew

  2. This is a really useful post. Team structure is a really hot topic for us at the moment, and I think we’ve been lacking a framework on which to hang the discussion, so this will definitely help. One of my ops colleagues independently discovered this post and mentioned that I should read it, which I take as a good sign.

    I’m wondering if there’s some overlap between 1 and 3. Both seem to somewhat resemble where we’re heading at the moment.

    • Hi Tom. Thanks for the comments; I’m glad the post was helpful.

      It would not surprise me to see a hybrid of Type 1 and Type 3 if the organisation has enough maturity and drive to have some Ops folk working directly alongside Devs, while other Ops folk remain further in the background, focused on ‘internal cloud’ or IaaS.

      Matthew

  3. I’m confused by your examples. You provide Netflix as an example of an org that is fully-integrated. However, I would argue that Netflix only appears fully-integrated because they are actually the best example of IaaS – being almost fully reliant on AWS for their infrastructure. These types of inconsistencies make question the “moderate” potential effectiveness you’ve assigned to the IaaS pattern. I would argue that IaaS has the highest potential effectiveness of all the options.

    • Hi Cliff, thanks for commenting.

      Having had a few months to reflect on the diagrams, I can see where you’re coming from, and I think I will write a follow-up post to clarify some ideas. To some extent, Type 2 could be a zoomed-in version of Type 3 (Iaas) just without the ‘Infrastructure’ operations part.

      The reason I think that IaaS is only *moderate* in effectiveness is for non-EC2 situations, particularly where an organisation runs its own hardware (even as an IaaS setup). When I considered effectiveness, I was thinking about opportunities for product teams to learn about operability and operations by talking to ops people: if infrastructure is provided ‘as a service’, then I think the potential learning opportunities are reduced for many organisations (Netflix is not a ‘normal’ org in my view, so different rules apply).

      Etsy might have been a better example for Type2, as they run on physical hardware (not EC2) – in their case, they can align devand ops very closely, largely because they have a single ‘product’ and presumably are in control of all their delivery deadlines (rather than having deadlines set by paying clients).

      Matthew

  4. As others have said, no organizational structure works for everyone and the organization structure is not the destination. In that spirit, it would be valuable to express these patterns more in terms of organization transitions. Type 5 – Temporary DevOps Team is an expression of an expected transition. (When I mentor individuals about organization changes, i remind them that all re-organizations are temporary). When you think this way, Anti-Pattern B can be implemented successfully with the right leadership. I have seen this pattern work when a leader was trusted by both sides and when the team was composed of thought leader from the two sides that had a leadership style that helped others to succeed. When these conditions exist, anti-pattern b can lead to faster change. When the leader is “new” or “unknown” and the team is composes of outsiders, the ANTI- part of the pattern is a certain outcome.

  5. Pingback: Highlights of OpenStack London 2014 – ‘Biodiversity’ and Resilience | Skelton Thatcher - Blog

Join the discussion...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s