It’s remarkably common that I talk to people about test tooling and we mean entirely different things. I think we are talking about unit tests, they think we are automating the user interface, and the person eavesdropping thinks we are talking about continuous integration.
We don’t even agree on what the words mean.
Figuring out at what level we are testing, however, is just the start. There are other assumptions about how our tests will run, where the test data will come from, and how we will know that there is a problem. Another way to say that is we need to have an oracle: a method of identifying problems. Teams that rush to tooling without thinking long and hard about these issues will pay a high price later.
Let me put a number of that. In my experience, more than half of teams that fail to think through these issues will have completely abandoned the tooling effort within twenty-four months. The net return on investment on those efforts will be a negative number.
Let’s talk about how to do it better, based on the work of Christopher Alexander.
Before Alexander was a famous architect, he wrote the books The Timeless Way of Building and A Pattern Language, which proposed that patterns already existed in building architecture. These patterns were emergent because they resolved a series of conflicting forces. Alexander combined elements in myriad ways, as small as the window seat and as large as a corridor, that came into use by city planners and are part of our very language today.
He also got the attention of the computer science community, as his ideas became design patterns programmers use to create software. A design pattern consists of the pattern name, the problem it solves, how to implement the solution, and some consequences.
Let’s talk about some test tooling patterns I have seen, using a very loose version of the design pattern format.
Test Tooling Patterns
Click-click-click-type-click-inspect—this is the simplest and most obvious way to drive the user interface. This helps testers who are less technical solve problems. Generally the testers use a record-playback tool, or perhaps learn just enough programming to find user interface elements in CSS and XPath and to write very straightforward, linear code.
The code that is developed this way tends to take longer and longer to run, and it is often brittle; a single change in a user interface element can break many tests.
It’s useful to have some test tooling patterns to employ, both to make your life easier and to improve your code maintenance. These are some of the most useful ones I’ve seen.
Abstract user functions: In an attempt to make code less brittle, you can createfunctionsthat can be reused, such as “login,” “search,” or “tag.” That means when the business function does not change but a few clicks do, the change can be made in one place. Done well, this pattern can make test code more easily readable and writable by less technical users.
Abstract user interface elements: Instead of cutting and pasting the XPath around, store the locator to the user interface element in a lookup table. Again, when the user interface changes, the change only needs to be made in one place. Commercial vendors sometimes call this an object repository. Now, instead of rerecording a test, we can recapture the element and rerun the test. This work can subtly increase the complexity of the overall solution, especially for Windows software.
Mob on test spec: Once the abstractions are in place, the whole team can come up with real, detailed test examples as a group. These examples are not as simple as “Passwords with fewer than six letters fail,” but instead a table of real examples that includes five valid logins and passwords, along with five invalid ones. Specification by example and acceptance test-driven development (ATDD) both implement this pattern. Five years ago the developers would go off to implement the code and know they were done when the tests run, but today’s more progressive teams are likely to work together through the entire process.
Do it twice: This is especially valuable when the software gives an “answer,” such as a credit score, a yes/no decision, a medical diagnosis, or an insurance quote. Instead of a few examples, actually implement the heart of the algorithm twice. Then you can create a test scenario for any input and get two answers. If they match, we can say two reasonable interpretations of the specification came to the same result.
Take the best two out of three: If you are doing deep engineering work in mission- and life-critical projects, you can implement the heart of the algorithm three times, along with a supervisor program that contains the answers. If they differ, the program can take the best two out of three and log an exception. Some NASA programs use this pattern in production!
Mock external boundaries (test in isolation): External boundaries, like a file system, a network, or a database, can be slow. Some tools allow you to mock or fake out the user interface, such as headless mode or PhantomJS; others just test the user interface entirely in isolation. Or you can stand up a fake server to, say, respond to API requests that are preset, instead of actually getting answers from a database. The downside of this is you do not get real end-to-end testing and can miss integration bugs.
Isolate the back end: Large sets of GUI tests take a long time to run and can be hard to maintain. API tooling, which hits the back end, can be faster and less brittle, and it allows you to “poke” the GUI. So have a few tests that check the entire system end to end, layer human exploration on top of that, and test a great deal through the API for every build.
Compare test to production: Sometimes I work with organizations that have data-intensive apps that consume one form of data and produce another. These can have customer information, claims, and even financial information, and we might not have a clear oracle. In that case we can stand up two systems side by side, run the same data in, and compare the output. We are doing it twice without having to create an oracle—the production version is the oracle. The downside is this cannot find existing defects in production, and it will find label differences between test and production as “errors” when they are really things to investigate.
Kill the god with facades: Replacing legacy systems can be so intimidating that we prefer to just bandage it up instead. One common antipattern that is particularly hard to rewrite is the “god” system or class: the piece of legacy code that does far too much. One solution is to build a facade that interfaces with the existing system and pulls a small piece out, and then we rewrite that small piece. Eventually, we can remove the facades and have the subsystems work with each other. I’ll call this a testing pattern because we can test each subsystem well behind the facade. A programmer might call this a test-focused implementation of the strangler pattern.
Use synthetic tests of production to monitor production: Plenty of teams I work with have production statistics but can’t translate that back to user impact. Some of those teams have tests that do all the things users do. By adding timing and putting them in the database, we can get a canary to warn us when the system is failing to be useful, or just running slow. This requires the creation of test accounts in production and generally will add test requirements that will push us toward isolation and good design.
Facade and test: Sometimes we want to do dozens, hundreds, or thousands of things through the user interface that would just take too long. Or the user interface might be something simple we have high confidence in that is hard to “drive” with a tool. We can rip the head off the user interface to create an API, either for one transaction, one batch mode, or both. Like synthetic transactions, this adds a little bit of work for the programmers, but it makes the design clean. In some cases, making the API reusable can enable entirely new classes of products, such as a mobile application.
These are just a few of the patterns I’ve seen over and over again in testing, mostly on the customer-facing side—the xUnit patterns for unit testing are well explored. What did I miss?
Useful ideas. Thank you.
Please explain 'implementing the heart of the algorithm twice' section. I did not understand how implementing the same algorithm twice would be different from running it once.