Testing in Production: A Double-Edged Sword


In the earlier days when most companies used a waterfall style of development, Google was joked about in the industry that it had its products in beta forever. In fact, Google has been a pioneer in building a case for testing in production. Traditionally, a tester has been responsible for testing all scenarios, both defined and extempore, in a test or staging environment before the build could go live. But today, this premise is changing on several fronts.

For instance, the tester is no longer alone in testing. Developers, designers, build engineers, other stakeholders, and end-users, both within and outside the product team, are testing the application to provide feedback.

The test environment, product under test, underlying technologies, test combinations (devices, platforms, browsers, etc.), are all so complex today. The services mindset is very high; gone are the days of a local setup to test for the product. Cloud environments offer the scale and ease needed in testing complex interfaces, and the production environment offers unique testing opportunities that cannot be fully implemented before release.

And the constraints the test team works within, including time, cost, and availability of niche testers, are more prominent than ever before.

With all these factors at play, testing in production is inevitable. However, instead of looking at testing in production as an option that is being thrust on the teams, if one closely looks at the inherent benefits it holds, this exercise can greatly help in beefing up the quality of the product.

So, what does it really mean to test in production, what are some of the ways in which it can be done, and how does it help?

What It Means to Test in Production

Testing in production is an exercise where the quality function of validating and verifying an application is taken up in the live environment, after release, either by the tester himself or by end-users. There could also be others such as business people, marketing teams, or analysts who share feedback with the product team. Testing in production gives more realistic opportunities to test, increases application transparency between the core product team and users, and supports the idea of continuous development through continuous testing. In this mobile-first world, testing in production is a core technique to embrace in your testing process.

Of course there is a negative connotation to issues reported from the field: This basically meant the tester did not effectively and comprehensively test the product. An issue reported from the field is handled with very high priority and fixed as soon as possible. While issues that show up in production can still leave a black mark on the tester’s effort, a test in production initiative nowadays has an expanded scope. It is not just reactive actions in response to user-reported issues, but also proactive planned test efforts that are taken up before the product is officially rolled out.

Techniques for Testing in Production

The core techniques for testing in production encompass active and passive monitoring (involving real data and synthetic transactions), experimentation (both controlled and uncontrolled) with real users, and stress tests to monitor system response. While these may sound simple, one has to be extremely sensitive in all of these tasks, as live users’ data is often involved and has to be protected (the sensitivity is not just with any data loss, but also with data privacy and security). Also, the volume of transactions in live environments are so high that any test effort here has to be adequately monitored and followed up on.

Techniques such as telemetry, or diagnostics around software usage, bring readability into the outcomes of testing in production. While active monitoring focuses more on user-generated outcomes, passive monitoring relates to test effort from the test teams with synthetic data. So an example of active monitoring would be if you engage a beta crowd team of users to provide you feedback, whereas passive monitoring is when the test team initiates monitoring either by itself or through the rest of the product teams. This method can also include the operations and support team executing a set of automated sanity tests on an ongoing basis to keep tab on the health of the application in the live environment.

Experimentation through a controlled form often involves techniques such as A/B testing to gauge user feedback on specific scenarios, while uncontrolled experimentation relates to beta testing and crowdsourced testing. Stress tests in a live environment need to be closely monitored, especially ones performed during peak load seasons for the application. For example, a shopping application’s peak season is during the holiday season or specific sale offers they introduce. Instead of passively waiting to hear issues from the field during such peak seasons, testers should monitor loads at such times proactively and have a ready team to address issues.

Relating to the above, products today also have a persistent social media presence. Retail stores and even desktop applications have their own dedicated pages on Facebook, Twitter, and LinkedIn. The tester has to proactively watch for discussions on these forums to see what users have to say about the product—usability, performance, overall functionality, and UI are some of the top areas to seek feedback. User field studies, visits to enterprise deployments (in the case of enterprise applications), and booths at events and conferences are all great places to get live feedback, which ultimately flows into the bucket of testing in production.

Words of Warning

While testing in production has ample scope and potential, it is not an invitation for testers to delay their testing responsibilities until after release. The collaboration and collective ownership that quality has evolved to requires others on the team, such as developers, designers, architects, and build engineers, to also take up testing in all environments, including live ones. The truly passionate tester will take advantage of this to free up cycles and take on bigger and better things.

It may also be tempting to adopt testing in production at an organizational level to promote a faster time to market at a lower cost, but such a strategy should absolutely not be encouraged. The product quality, user loyalty, brand acceptance in the marketplace, and the overall positioning of the test team can suffer.

Testing in production should be seen as a double-edged sword—very effective if used correctly, but harmful if trespassed into unprepared. With the bounds well defined and value proposition outlined, testing in production has a lot to offer in the coming years, especially as the lines between the product team and end-users get increasingly blurred.

User Comments

Ed Weller's picture

WHen I asked the passenger next to me on my last flight why he was pounding on his keybord furiously, he said "stop interrupting me, I am "testing in production"

There are products that must work without critical failures in production.

I also have a problem with the statement "can still leave a black mark on the tester’s effort". There is an equal black mark on the development side, isn't there? Testing can never prove the absence of defects, however for mission critical (loss of life) software, IBM showed how to deliver failure free software on the Space Shuttle. They did not test in production (Note the difference between defect and failure)

I think this term needs to be discarded as what the author described is ful life sysle testing (regardless of dhosen life cycle model (Waterfall, Scrum, or some combination thereof

November 1, 2016 - 5:00pm
Tim Thompson's picture

I agree and there is a difference between free (as in advertisement and data mining funded) apps from Google and enterprise or conusmer applications that companies or individuals pay money for. I think it is an ethical conflict to ask for compensation and then have the user act as test subject.

As far as leaving black marks on testers, they will mark themselves enough when there is a problem in production. The blame - if there is any to be laid - goes to the entire team. Often enough issues arise that should have been considered during analysis and design. There are plenty of eyes on any product and feature before release, singling out testers as the culprit is unjust.

Unless it is about Space Shuttle software or the like, the compromise is beta phases or pilot projects. In pilot projects select customers get the application and service for free for a period of time in exchange for accpeting the occasional inconvenience of the product not working as expected. It also gives pilot customers the unique opportunity to directly influence the design.

November 4, 2016 - 6:08am
Rajini  Padmanaban's picture

Thanks for the your comment Ed. The reason I say that it can leave a blank mark on the tester, despite quality being collectively owned today, is that quality continues to be and will be the core charter of a tester. While others also own quality within their own spheres of operation, when an issue is caught in production, the tester is the one who is first questioned and rightfully so. If such specific responsibility is not brought in, accountability will be missed which will adversely impact quality.

November 8, 2016 - 5:06am
Rajini  Padmanaban's picture

Thanks for the your comment Ed. The reason I say that it can leave a blank mark on the tester, despite quality being collectively owned today, is that quality continues to be and will be the core charter of a tester. While others also own quality within their own spheres of operation, when an issue is caught in production, the tester is the one who is first questioned and rightfully so. If such specific responsibility is not brought in, accountability will be missed which will adversely impact quality.

November 8, 2016 - 5:07am
Robin Goldsmith's picture

There’s a reason testing in production is also called the “big bang approach.”  I agree the black mark is not the tester’s.  Rather, too often it’s the residue of the company.

November 7, 2016 - 2:01pm
Rajini  Padmanaban's picture

And herein I am only talking about misses from an engineered implementation standpoint and not from a design miss standpoint. I agree with you all that it may end up to be a miss on the team's part and not just the test team's part, but rightfully so, start the analysis as a miss from the quality function brings in clear accountability that can then be further analyzed.

November 8, 2016 - 5:12am
Brody Brodock's picture



Thanks for the article. I do have a couple comments, concerns, and observations.

Testing in production has always occurred, it is called releasing the product. The difference between having a QA organization and not having one is that the defects and variances released should be less with a performant QA team than without. Maybe the defects do not matter, as would be the case for Google, as their context is simply their ability to sell adds. The ability for Google to innovate has brought us some wonderful tools and I believe we are generally more informed and able because of those innovations – but nobody was relying on those innovations when they were released.


The context of the solution is incredibly important and I would argue that testing in production is not suitable for many industries – even if it actually occurs in those industries. We certainly wouldn’t want automobile software to be tested in production, (although it seems that some of the console apps are indeed not professionally tested) nor would we want healthcare applications tested in production, (or aviation, space, rail, power plants, communication, etc, etc) so the context matters. 

One thing that you didn’t bring up is the risk of testing in production, how testing in production can actually bleed over into the real production world. In medicine, I wouldn’t want to see a ‘test’ lab result show up on a real patient – nor, for regulatory reasons, would it be appropriate for a tester to even be testing in production. You touch on these concerns tangentially from a test responsibility perspective, but there are times where it would be better to use or mine production data in a controlled test environment as opposed to being in production. We certainly want ‘good’ data, and our customers will deliver the best data the application will allow, (and sometimes more) which is almost certainly better than trying to craft fake data to cover issues. And, if you have a good defect taxonomy both crafted and copied production methods will improve your capture rate.


Again, interesting article, but a reminder that it isn’t just about agile vs waterfall, or social media, or a map/docs/traffic applications. There is still a need for crafted testing prior to release in the wild for user testing in production.

November 10, 2016 - 12:17pm

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.