A Simpler Way of Using Machine Learning to Shift Testing Left

[article]
Summary:
The advantages of shifting left and testing as early as possible are obvious. But as you automate more testing, the test suite grows larger and larger, and it takes longer and longer to run. Instead, just automate the process of finding the right set of tests to run. The key to that is machine learning. This isn't AI bots finding bugs autonomously without creating tests; this is a different way to use machine learning, and it’s far simpler.

The advantages of shifting left and testing as early as possible are obvious. What’s less clear is how to do it.

In my role as test engineering manager at a large financial services firm, that challenge fell on me. The solution that some of the biggest software companies—Google, Facebook, and Microsoft—have all developed takes advantage of machine learning to test every code change immediately, so I learned how to implement these techniques myself. And I think these techniques would benefit everyone else struggling to shift their testing left, too.

Shifting left means testing as early as possible, making testing an integral part of each stage of the development process. The earlier bugs are caught, the faster and easier they are to fix. Ideally, we want to test every commit as soon as it is applied so developers can fix any issues immediately while the code is still fresh and the rest of the team hasn’t started building on top of it.

This obviously requires automation. Only with automation can we run hundreds or thousands of tests in a few minutes. While it wasn’t feasible or even optimal to automate everything, our team set to automate as much as we could. That had its own challenges and took longer to implement than we had planned. And while it helped us dramatically increase the number of paths and permutations we could test quickly, it didn’t help us shift left as much as we expected.

Not surprisingly, as we automated more testing, the test suite grew larger and larger, and consequently, it took longer and longer to run. When we started, the test automation suite only took a few minutes to run, so we were able to test every commit as it was applied. But as the number of tests grew, the test suite started taking hours to run. The UI tests were the slowest, but even the sheer number of integration tests made it impossible to check individual commits. So we had to batch the test process and run the suite periodically, pushing testing back to the right.

To reduce test times, we tried parallelization, but that was expensive and couldn’t even keep up with the growth in the test suite. Even worse, as the number of tests increased, so did the number of flaky failures that had to be reviewed manually.

Since UI tests took the longest to run and generated the most flaky failures, we tried reducing the number of UI tests by cutting down the number of separate phone models we tested. Unfortunately, that caused us to miss more bugs, including some that weren’t caught in manual testing either, since we expected the phones to behave similarly. We also tended to miss bugs where there were cross-dependencies and it wasn’t obvious to the manual testers that an area of the application had been affected by the code changes. That was a valuable lesson on the need for test automation. Consequently, we added back all the UI tests on all the different phones and browsers.

Eventually, it became necessary to run the automated test suite overnight. Each morning, the QA team would manually separate out the flaky failures and try to identify the changes that caused each defect. Developers would then have to fix the previous day’s bugs, and those fixes wouldn’t be fully tested until the following night. While it wasn’t exactly waterfall, we were not able to attain the level of speed and efficiency that we wanted from our agile process. So I looked for a better solution.

I began by researching how other companies solved the problem, particularly the largest ones. With thousands of developers making frequent commits and multiple releases per day, they couldn’t wait for overnight testing. But with millions of tests, it was impossible to run every test for every commit. Their solution was to run only the few tests relevant to the specific changes.

The concept is simple in principle—if a developer makes a change to a map functionality, for example, there is no reason to run all the tests that have nothing to do with mapping. It’s easy to identify the right tests manually, but that takes longer than running all the tests. So it’s necessary to automate the process of finding the right set of tests for the code changes. The key to that was machine learning.

Whenever I mention machine learning to people in the QA community, they assume I’m talking about AI bots that can find bugs autonomously without creating tests. This is a different way to use machine learning, and it’s far simpler.

By learning from the test history and what code changes cause different tests to fail, machine learning can automate the process of picking the right tests for any particular code change. And no manual adjustments are needed as the codebase and tests changes over time; it all happens automatically. By running only the small number of tests that are actually relevant to any particular change, the tests can be run in just a few minutes. That makes it possible to check every commit as it’s applied and provide an immediate pass/fail result to the developers before they move to the next task. That’s shifting left.

But machine learning isn’t perfect, of course. It generates probabilities instead of absolutes. So it’s still necessary to run the full set of tests occasionally just to be sure no failures were missed. But because almost all failures are caught immediately (Facebook reports catching 99.9% of regressions this way), the full test suite can be run infrequently, thereby reducing testing costs.

Google calls this technique regression test selection. Facebook calls their version predictive test selection. While both companies have been open about describing the technology, the implementations were built for their unique development environments. Microsoft also created something similar, which they call test impact analysis, and included it as an option within Azure DevOps, though it is limited to use on .Net and C# files.

Whatever the name, this machine learning technique seemed so fundamental to making automated testing work properly that it seems like an efficient choice for any QA team trying to solve the challenge of shifting left.

The giants of software development have provided the outline for how to shift testing left by applying machine learning to select the right automated tests to run for each code change. This method can help you shift your QA testing left, too.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.