A continuous build may be a great idea, but it takes more than a great idea to be successful. In this article, Tony Sweets describes his personal experience with difficult build servers and his organization's move toward a continuous build.
You work on a software team, and someone decides that a continuous build would be a good idea. Someone should give that person a raise! But, who is going to put this thing together? Software developers typically just want to write code, not build information systems. The IT department typically doesn’t know enough about the software development and testing process to configure such a beast. This is something for your project’s toolsmith—a role I gladly take on in engineering teams.
The following is a real-world case study of the build system at my workplace. I hope it will ease your transition into the wonderful world of continuous builds with Hudson.
The Evolution of Our Build Server
When we began continuous builds about seven years ago, I built a homegrown server (best bang for the buck at the time) to handle the work. Our build job consisted of compiling Java code and creating a war file for our testers to deploy. This ran in a few minutes, and life was good. The testers no longer needed to build code manually on their workstations, saving them a headache and making them more efficient. The build server emailed the committers to tell them if the build passed or not. Since this was just a compile, it passed 99.9 percent of the time.
This was before we had a suite of JUnit tests to run. We had a JUnit here and there but nothing that consisted of a full suite of tests. Our boss at the time, Mike Cohn, set out to change that. We started in on a test-first methodology, created a top-level test suite, and added that to the build. Our build time was starting to increase, and the successful build percentage started to fall.
Our whiz-bang tester, Lisa Crispin, was using a web application testing tool from a company called Canoo. The framework is an open source project named WebTest. When Lisa wanted the WebTest test integrated into the build, I was able to add it and write a plug-in for our build system at the time, CruiseControl, to capture the results. This was the beginning of the end of that homegrown server. The build time was up to around ten minutes, even with a watered-down list of tests to run. We created a build to run all of the tests at night, as this now took almost an hour. Our build server also crashed. We were feeling some pain for the first time.
Build Servers Two and Three
Again, I built a homegrown server using 2 AMD 2600MP CPUs. We called this server “Build 1.” Our regular continuous build times crept up as we added JUnit tests, and we hit the ten-minute mark again. We made an internal goal to keep the build down to seven minutes, but to accomplish this we had to buy another server. This one we called “Build 2.” Build 2 got the continuous build down to seven minutes, and we used Build 1 for the “nightly” build, which, at this point, ran whenever it could after it saw a check-in.
Failure after Failure
We beat the crap out of these servers. During the workday, they ran nonstop. Builds were both IO and CPU intensive, so these servers got extremely hot and components (typically the drives) failed. Each time they would fail, we would feel it. Not only did we have to recover the hardware, but also our code started to degrade as regressions were not caught quickly. It took even more time to find the bugs, as they were “stacked” and it was not clear which check-in caused the failure. Time is critical when you are doing two-week iterations, so these failures usually caused our releases to slip. However, this is something we had to live with as money was always allocated to other functions in the company and not so much to build servers.