Blending Machine Learning and Hands-on Testing

As your QA team grows, manual testing can lose the ability to focus on likely problem areas and instead turn into an inefficient checkbox process. Using machine learning can bring back the insights of a small team of experienced testers. By defining certain scenarios, machine learning can determine the probability that a change has a serious defect, so you can evaluate risk and know where to focus your efforts.

Despite a push toward greater test automation, at most companies, manual testing still plays an important role in the QA process. Manual testing, when done right, is efficient in checking new functionality in ways that are difficult to automate and can stress the code in ways the developers never considered.

Based on my experience running a large test team for a financial services company, I know how important manual testing is. The problem, however, is that as our team grew in size, manual testing lost the ability to focus on likely problem areas and turned into an inefficient checkbox process.

In looking to return manual testing to a targeted process, I found a way to apply machine learning to give manual testers in a large organization the same insights as testers in a small team.

Our test team started as a small group located in the same office as the small development team. That made it easy to collaborate: We knew exactly what the developers were working on and why, and we knew each developer’s habits, strengths, and weaknesses. When I saw a code change submitted five minutes before the sprint deadline, I knew it was rushed and likely to contain at least one serious bug, so we tested it thoroughly.

I knew that when Bob made a big change, it was likely to have at least one bug, because Bob never checked anything carefully. Mary, though, was extraordinarily meticulous. Her changes were always perfect because she triple-checked everything before submitting it. Her only bugs were dependencies she missed, so for Mary’s changes, we spent little time on routine testing and checked instead for dependencies. Kumar was somewhere in the middle, but if he made a change at 5 a.m. or anytime on Sunday, I knew it needed extra review.

Developers never believed me, but we understood the overall operation of the code better than they did, since they usually focused on particular functional areas. We could guess dependencies and integration issues that they often missed. We also had a clear understanding of which functions were mission-critical and which were window dressing or only used by a few customers, and we could focus our testing accordingly.

A display issue with an old version of a browser on an uncommon mobile phone was a bug that we wanted to find, but a small mistake in the money transfers section would be a disaster, so we knew where to prioritize our testing. In other words, given a limited amount of time for testing, the experienced manual testers knew how to prioritize.

However, as the company expanded and both the development and test teams grew, we gradually lost that insight. Instead of working closely with developers, many of the new testers were in a separate office in a different time zone, limiting personal contact. Most of the manual testing was eventually moved offshore, and some of it was done by short-term contractors. Many people were new to the company, with some even new to testing, and turnover increased. The testers no longer had the same familiarity with the development team, and most had little experience with the code.

At the end of a sprint, the testers were given a list of changes, but they didn’t have the insight to know which changes were most likely to have bugs—and they had even less understanding of which changes were critical and which were less important. So manual testing became rote, checkbox work, simply running down a list of tests to check as many as possible within the given time constraints.

At the same time, the company was trying to shorten the release schedule, and we were under pressure to reduce manual testing time further. Meanwhile, the code base continued to grow, multiplying the complexity. I needed a solution.

The idea of automating everything was tempting, but that had its own challenges that made it impractical. And I knew that making the developers responsible for testing the code, though it sounded great in practice, was not only futile, but also wouldn’t uncover all the ways the code can break that the developer couldn’t have thought of themselves.

After a lot of contemplation, a bit of experimentation, and way too much tea, I realized that machine learning could be helpful. Instead of a squadron of bots to go off and find the bugs automatically, what I came up with was a way to take advantage of machine learning to help manual testers know where to focus their testing.

Essentially, what I wanted to accomplish was to bring the same insights of our small, integrated, experienced test team to a large, dispersed team. I built a machine learning–based system that looked at the code history and test results in the same way as our team’s most experienced testers to determine the risks in software changes.

I included several parameters:

  • Size of the change
  • Number of files changed
  • Number of developers changing the same file at the same time
  • Whether the files have been touched recently or if the code is stale
  • Whether the particular code area is defect-prone
  • The developer’s experience with the particular code area
  • The time of day the commit is made
  • Remarks and profanity in the commit descriptions

In total, the model I built uses around 30 factors to determine the probability that a change has a serious defect. At the end of the sprint, I construct a heat map of risks to show which areas of the code have been changed and which changes have the highest risk. This risk map makes it easy to prioritize testing for those changes most likely to contain bugs, guiding the test team to work efficiently and find the most bugs in the least amount of time.

Once the machine learning risk assessment was implemented, I added an alert system. Whenever a high-risk changes was made, my machine learning module sent an alert to the test team. I also flagged the areas of the code that involved money transfers so that we were immediately alerted to any changes to those functions.

The alerts were mostly to give my team early notice so they could start testing the riskiest changes right away instead of waiting until the end of the sprint. But the alerts can also be useful to get the developer to double-check the changes or add to the review process to make sure those changes receive a closer look.

Together, the risk map to prioritize testing and the alert system to give the team more time to check the highest-risk changes helped us turn manual testing from a slow and tedious checkbox review back into a targeted and efficient bug-hunting effort.

User Comments

Alexei Tcherkassov's picture

It's pretty cool story. It would be interesting to learn more about technical details of mentioned "machine learning–based system" for analyzing code check-ins.

February 10, 2020 - 10:02am
Mark Bentsen's picture

"...profanity in the commit descriptions"; now that's a red flag.

February 11, 2020 - 5:16pm
James Farrier's picture

Sadly this is something I've come across multiple times.

February 13, 2020 - 1:07pm
Sandeep Chadha's picture

This would go a long way... Could you please share details about the "machine learning–based system"

February 13, 2020 - 6:45am
James Farrier's picture

Hey Sandeep, I'm happy to give you more details, what do you want to know?  Alternatively, you can contact me via email jamesfarrier at

February 13, 2020 - 1:10pm

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.