Digging Deeper into DevOps

[article]
Summary:

The DevOps movement was started to address the communication challenges between development and operations teams, but instead of engaging in the continuous cycle of self-improvement, management often wants to mimic techniques used by other successful companies. W. Edward Deming showed decades ago that copying others is not effective. This article suggests better approaches to good communication.

When you search Google for the phrase "What is DevOps," you get hundreds of thousands of results.

Below are a few of the definitions collected from the search.

Rackspace’s promotional video “What is DevOps? - In Simple English” says:

DevOps integrates developers and operations teams in order to improve collaboration and productivity by automating infrastructure, automating workflows, and continuously measuring application performance.

From Wikipedia:

DevOps (a portmanteau ofdevelopment and operations) is a software development method that stresses communication, collaboration and integration between software developers and information technology (IT) operations professionals. DevOps is a response to the interdependence of software development and IT operations. It aims to help an organization rapidly produce software products and services.

The Agile Admin explains:

Effectively, you can define DevOps at the top level as system administrators participating in the product development process alongside developers and using a many [sic] of the same techniques for their systems work.

And from IBM:

DevOps is an enterprise capability for continuous software delivery that enables organizations to seize market opportunities and reduce time to customer feedback. By applying lean and agile principles across the software delivery lifecycle, DevOps helps organizations deliver a differentiated and engaging customer experience, achieve quicker time to value, and gain increased capacity to innovate.

In my own words, I can summarize DevOps as a mix of collaboration, automation tools, continuous delivery, and agile.

More interesting questions are, What problems is DevOps trying to solve? And is DevOps successful in addressing those problems?

How Problems Originated

This diagram taken from a presentation for Docker, an open source container-management engine, exposes the real problem.

Market View: Evolution of IT

During the last two decades, we moved away from the development of monolithic software applications running just on a few servers. Now, we have complex systems assembled during development that use a wide variety of best available tools and run on large hardware clusters, clouds, built from hundreds or even thousands of physical servers and virtual machines.

In such an environment, maintaining the infrastructure using traditional methods of manual maintenance becomes increasingly difficult.

This fact led to the creation of automation tools for operations, such as CFEngine and, later, Puppet. But it also meant that part of the application logic related to the underlying infrastructure moved to another department and slowly created the tensions described in one of the most important articles about DevOps, “What is DevOps?”

The article suggests that tensions were caused by differences in the mindsets of developers and system administrators (at that time they were not yet being called DevOps engineers), changes in organizational structure, and variations in tooling.

But the biggest challenge that inspired the creation of the DevOps movement was the need to keep up with the growing amount of communication between the development and operations teams.

The First Step toward a DevOps Solution

Admittedly, DevOps is not a specific methodology, like Scrum, but more of an umbrella term, similar to agile. It means that there is no prescription on how to “do” DevOps.

And yet, by looking at the many definitions of the term, we can recognize certain practices intended to enable successful DevOps adoption.

Such practices typically include cross-functional DevOps teams with shared business goals, education for creation of a similar mindset between Dev and Ops teams, and the elimination of differences in tooling.

These actions and other DevOps practices are created based on experiences uncovered by companies that already figured out how to manage large infrastructures—companies such as Etsy, Facebook, and Flickr.

I believe this approach has certain causality problems. As suggested by my colleague Mark Coleman at DevOpsDays in Amsterdam, emulating the daily routines of successful companies will not necessary make your company successful.

This truth also was observed by W. Edwards Deming, who popularized the lean approach following the years he spent in Japan. Many US companies tried to implement the ideas described by Deming, only to find out that the practices didn't work for them. In Deming's own words:

I think that people here expect miracles. American management thinks that they can just copy from Japan—but they don't know what to copy!

Forging Their Own Paths

What are the proponents of DevOps missing, then? In short, the real meaning behind Deming's research. From Wikipedia:

The Deming Cycle (or Shewhart Cycle): As a repetitive process to determine the next action, The Deming Cycle describes a simple method to test information before making a major decision. The 4 steps in the Deming Cycle are: Plan-Do-Check-Act (PDCA), also known as Plan-Do-Study-Act or PDSA. Dr. Deming called the cycle the Shewhart Cycle , after Walter A. Shewhart. The cycle can be used in various ways, such as running an experiment: PLAN (design) the experiment; DO the experiment by performing the steps; CHECK the results by testing information; and ACT on the decisions based on those results.

Successful companies get their core business right. But what is their core business? Many times, companies can be very innovative, get the incentives right, and have perfect customer relations. Unfortunately, technology is often perceived by the managers as a side activity—effectively, a waste.

Quoting Deming again:

The worker is not the problem. The problem is at the top! Management! Management's job. It is management's job to direct the efforts of all components toward the aim of the system. The first step is clarification: everyone in the organization must understand the aim of the system, and how to direct his efforts toward it. Everyone must understand the damage and loss to the whole organization from a team that seeks to become a selfish, independent, profit centre. [1]

Those successful companies mentioned earlier are all technology shops started during the last decade or two. They are all led by engineers who clearly understood that technology is not an expense on their balance sheet, but a competitive advantage that may allow them to beat their rivals.

Rare insight into one such organization, Google, makes it easy to see that managers there are not trying to copy anyone else’s methodologies or use the "right tools." They don't even have a name for their way of working.

Ben Treynor, vice president of site reliability engineering (SRE) at Google, said in an interview:

We’ve iterated to the current SRE definition over the last 15 years . . . . I’ve seen this definition work very well in practice here at Google, and I expect we’ll continue to evolve it to make the role even more attractive to developers while at the same time making it more effective at running efficient, high-availability, large scale systems.

When talking about the reasons behind the techniques being used at Google, Treynor plainly says, “It’s just a good way to run things.”

Google is not trying to imitate other successful companies. Google's way is based on Toyota's way and Deming's lean principles—continuous learning.

During my career at a variety of technology companies, many times I saw fake adoptions of lean methods. It is easy to do scrum standups and draw a burndown chart, but few people understand that the Scrum rituals are only the beginning of the learning process. And when you start learning, you should never stop!

I see the same happening with DevOps. Despite the fact that leaders of the movement advocate against “silver bullet” solutions, DevOps becomes a silver bullet of its own. Instead of addressing real challenges, new DevOps teams are routinely created and managers mark the problems as solved and move on to the more important, "real-life" challenges.

A Solution for the DevOps Communication Problem

Conway's law states that

organizations which design systems . . . are constrained to produce designs which are copies of the communication structures of these organizations.

From this observation, we can conclude that broken communication between Dev and Ops will produce broken software at the time of delivery and maintenance. The only way to resolve this issue is to change the organizational structure.

It definitely does not mean that creating a DevOps team is required. A new DevOps team will only result in the creation of yet another separate component in the system architecture.

The only reasonable way to reduce communication overhead is to remove the separation and move all product-related logic from the Ops department to the Dev department. The division should take into account the desired outcome for the product architecture. The most logical scenario is keeping hardware and infrastructure in Ops and moving all product deployment and maintenance tasks to Dev.

Conway's law suggests that this division will achieve the elimination of organizational barriers between the product and infrastructure development and the creation of a clear interface between the products departments and the infrastructure department, resulting in a contract or service-oriented approach like platform as a service, software as a service, or something similar.

I believe that all the challenges related to DevOps will resolve themselves on their own, given the right organizational structure. Engineers will find a common language, will have a common goal, and will have no choice but to use the same tooling when they are working on the same product team.

This approach is not unique for the DevOps challenge, but it is a normal lifecycle for each new activity in the development process.

The same scenario happened with software testing. It did not exist as a separate discipline before the late 1970s, but when the complexity of software development grew and quality declined, the function of a tester was introduced. Small teams of testers grew to a separate department from the 1980s to 2000s. Once testing become too complex, we saw the first test automation efforts, and when the communication between Dev and Test become too expensive, we saw the transition from separate testing teams to cross-functional teams of developers and testers each running tests and writing code.

In the majority of agile organizations today, this is a common practice. The same will happen with the infrastructure development in coming years.

A Transition to Continuous Improvement

DevOps is an important first step that started the transition of infrastructure development from being a side activity toward becoming a part of mainstream software development.

Once the transition is done, we will see an infrastructure definition, including deployment and maintenance procedures, being released as an integral part of each software product.

In the future, we will see a clear separation of hardware and cloud management from the software itself being consumed by the software products through clearly defined APIs.

Achieving this will require only one step, but it is an incredibly difficult one: transition to continuous improvement.

 

1. Deming, W. Edwards. 1993. The New Economics for Industry, Government, Education, second edition.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.