Programmable Infrastructure and DevOps Teams

[article]
Summary:

A decade ago, continuous integration became a key practice to support the agile process. Now, the hot topic is continuous delivery, and Pini Reznik has noticed many similarities between the adoption of CD today and the implementation of CI. You can learn a lot from past experiences.

During the last decade, agile became one of the leading software development methodologies, and continuous integration (CI) became a key practice to support the agile process.

In the last few years, we have seen two challenges to extending agile practices:

  1. Shortening the development cycle from weeks to hours or even minutes (continuous delivery)
  2. Expanding the agile process outside development to operations (DevOps) and other departments

These challenges seem to be new, but in reality we saw very similar trials ten or fifteen years ago in the beginning of the agile revolution.

Agile and Continuous Integration

While working in software configuration management in many different organizations, I have observed a consistent path for the introduction of an effective CI process.

Once the team decides to implement CI, the first stop is always an automated build. It seems to be obvious today, but ten years ago, fully automated builds that produced the entire software system, ready for installation and testing, was a luxury available only in the best software companies. It was especially challenging to achieve Martin Fowler’s ten-minute mark for an effective CI build. Once the build was running on a central CI server, the team would start creating test automation to ensure that the software not only compiled, but also could be installed and shown to be functioning reasonably well.

Once the CI build with some test coverage was running smoothly, teams would start feeling the pain of a long fix cycle. This was due to the fact that automated builds, installers, and tests were developed and maintained by separate specialized teams.

Finding out which team is responsible for fixing a bug could take longer than actually fixing it. This situation led to the introduction of cross-functional teams able to resolve 99 percent of their daily issues without involving any external party. Such teams took full responsibility for the entire development cycle, including build, installation, and automatic testing of the software, which led to the production of more stable and reliable code and faster resolution times for critical, build-breaking issues.

During the transition, tools would change, too. For example, test automation was not invented by agile teams, but the way it was done changed tremendously when it was moved to cross-functional development teams. Recording-based tools gave way to fully programmable alternatives such as xUnit. Later still, even user interface testing became an integral part of the normal development process. With the introduction of test-driven development (TDD), some of the teams recognized that adding tests in later stages is challenging for many development projects, so they moved the testing activities upstream to make sure that nothing is done without proper test coverage.

With TDD, tests are defined before even the first line of code is written.

Continuous Delivery and the Need for DevOps

In recent years we started seeing a strong shift toward continuous delivery (CD), which is nothing more than CI taken all the way to the client. When doing CD, the goal is to deploy fully functional code all the way to the clients within hours or even minutes.

On the way to this goal are exactly the same challenges observed while introducing CI:

  1. Automation of all parts of the delivery pipeline
  2. Additional test coverage
  3. Consolidation of the teams
  4. A unified toolbox

The first step toward achieving CD must be the introduction of programmable infrastructure, which allows teams to define complex runtime environments and deploy software automatically without human intervention. This step is as essential for CD as the automated build was for CI. In the beginning it is natural to create such automation from within specialized ops teams that would use the tools available to them, such as Puppet or Chef.

Later, when some system tests are available and the teams are able to build environments and deploy functional software, they will hit the same pain point of interteam coordination. Just as with CI ten years ago when the interteam problem caused many companies to merge development and testing, the same problem now needs to be addressed between development and operations. The answer is also the same: a cross-functional team, this time called DevOps.

But today, we are only in the beginning of this change. Organizations have started creating DevOps teams by moving development and operations engineers into the same team, so the next step on the way to achieve a real CD workflow is clear.

Unified Tooling or a Common Language for Dev and Ops

Puppet, Chef, and similar tools currently used to implement programmable infrastructure are conceptually similar to the test automation tools based on action recording. They replace humans by imitating their actions.

Such tools are going to be replaced soon by the new type of tools that are conceptually similar to xUnit. Using those tools, developers will be able to define software deployment and runtime environments as part of the regular development. This will allow unification of the tooling from the beginning of development through the entire lifecycle of the product.

The first big shift that we’ve seen in this direction is software containers and their ecosystem of easily programmable hardware, network, and storage emulation. The recently explosive popularity of Docker clearly shows that containers are now addressing the most pressing obstacle for achieving CD.

In the near future we will see more and more similarities between the transition to CD and our earlier experiences in transitioning to CI. In our company we have already started practicing our own “production first” concept that is very similar to TDD. We are setting up production environments for us and for our clients including live URLs before we write even a single line of code.

The Right Tool for the Right Job

While running a CD transition it is also important to learn from the mistakes we made while introducing CI. One such mistake was to move too much responsibility to the DevOps teams.

Today, dev teams are in charge of writing the product’s code as well as building and testing that code. Ops teams are in charge of the physical infrastructure and clouds as well as deployment and configuration of the products on the production infrastructure.

DevOps teams only need to own the pieces of the infrastructure that re part of the product logic—provisioning functional blocks of the system such as web servers, databases, load balancers, etc. Maintenance of the hardware and the clouds can remain under the responsibility of a separate subteam in case it can provide reliable and consistent APIs to consume its services. A public cloud like Amazon EC2, with its well-defined APIs and high quality of service, in some cases can effectively replace such subteams.

Conclusion

I believe that the most important factor for the successful implementation of CI or CD in an organization is the ability to resolve 99 percent of the daily issues without the help of any external party. There is no feasible way to achieve this when different parts of the delivery pipeline required for normal work are owned by different teams. Cross-functional DevOps teams and programmable build, test, and infrastructure are essential for the constant flow of changes through the pipeline to the customers.

User Comments

1 comment
Clifford Berg's picture

The author makes some great points, but before responding to them, I need to set the record straight about the history of thigns:

Fully automated builds were commonplace duirng the 1980s. All the projects I worked on during that period had fully automated builds, using make and bash/csh/sh. And we used full automated regression testing as well. On one project, all the failed tests were reported from a database (our VAX's builtin relational database), each morning. Year: 1985.

What happened later is that people shifted languages, and the new crop of developers did not know bash or make, because they were using Java. And make doesn't work for Java, so ant was created. Also, the new crop of developers were using PCs instead of Unix systems. And these factors started us on a path of not using bash - putting everything in ant. Ther result was that all the practices pertaining to automated builds had to be reinvented.

We also used CI back then. I remember having my own Sun workstation and a full copy of all the code locally, under SCCS (a predecessor of CVS), and running tests locally before checking the code in. The only thing missing was Jenkins: we kicked off the nightly builid and test run via a cron script.

Of course, the Web did not exist back then, but there were automated UI testing tools. I did not use them because I did not work on UI-based systems.

Our teams were highly cross-functional as well. I don't recall anyone being anything other than a developer. There was no DBA. There was no tester: people took turns writing test scripts in a text editor (just like writing gherkin today), and then writing the test code (in C or whatever).

The work also tended to be feature based: we had lists of features to be implemented and worked down the list. One would implement the feature, test it, and then check the code in. But I think that there must have been other companies that did things differently - otherwise, we would not hear all these horror stories about waterfall. Between 1980 and 1995, of all the projects I was on, one of them - only one - was a waterfall project. And indeed, it failed. But all the others succeeded.

The long fix cycle that the author refers to is indeed a huge problem, but I don't think that it is due to haivng specialized team members. I belive it is due to relying on the CI process for testing. Before you check your code in, you should have run the acceptance tests locally. That's why they all need to be automated.

I also agree with the author that the current devops tools need to be replaced with declarative tools. Ideally, the tools should use static syntax - not be dynamic - because a statically verifiable configuration is far more robust: one can add rules for security, for environment integrity, etc. - all verified using compilation techniques. Things are far too flaky today - half the problems that happen during development are due to devops configiration issues. We need a statically verificable configuration.

Ironically, containers are not new either. The devops community is merely rediscovering them. Containers were popular during the mid-90s for IPSs who enabled their many websites to operate in "sandboxes". The new crop of containers are based on some new Linux APIs, but the idea is not new. The DevOps crowd is merely late to the game of using this approach - reinventing things - as always happens - and thinking they are new ;-)

What is new is that developers are starting to be exposed to these tools. The tools today are mega-crappy: as I explained, basing them on dynamic languages is a huge handicap, and the tools are generally maintained by individuals rather than companies and so they tend to be incomplete and somewhat flaky. For infrastructure we need things that are rock solid.

The author's primary point, that DevOps should be the province of developers - not a DevOps team - is spot on. Creating a special team for that is a huge mistake. It is a good thing to have a team of DevOps coaches - but they should not be writing code themselves. And most of all, if they are creating their own home-grown frameworks, then you are in real trouble, becuase those frameworks become a barrier to entry for the developers, and a huge key person dependency for the authors of those frameworks. And if a developer tries to use the framework, and it doesnt' work, and the framework creator is not around to ask for help, the developer cannot "Google" the issue and find the answer. Don't create your own home-grown DevOps frameworks.

 

January 19, 2015 - 1:22pm

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.