Test Data Privacy: Start Now to Comply with New Regulations

The key for test data privacy is fulfilling testers’ needs for efficiency, speed, and the most accurate representations of data and application behavior in the production environment, while ensuring privacy and protecting testers from unintentional hazards. Here are some tips for getting started on a test data privacy project to comply with the EU’s coming General Data Protection Regulation.

The EU’s General Data Protection Regulation (GDPR) refers to new laws requiring companies to delete all instances of EU customers’ personally identifiable information upon customer request. The GDPR also requires customers’ conscious and explicit consent to use their data for various purposes, including application testing. This means that if organizations use live, undisguised customer data in their testing processes, they are violating the GDPR.

Failure to comply with all aspects of the GDPR by May 25, 2018, will result in heavy fines, and it’s not just EU companies that need to care. Any company that serves and possesses data on EU-based customers needs to comply—including many US companies.

The GDPR deadline may seem far away, but considering the magnitude of change companies will need to make in the way they handle sensitive customer data, it’s not too early to start preparing. These changes will trickle all the way down to software testing and QA teams, which often handle personal data in the course of their testing work.

Ensuring privacy of customer data used in testing has always been a major issue. According to a recent survey of CIOs of large companies, 83 percent of US organizations use live customer data in test systems when testing applications because they believe the use of live data ensures reliable testing and accurately represents their production environment. Another 83 percent also routinely provide customer data to outsourcers for testing purposes. This is a recipe for potential disaster if this data gets into the wrong hands. The GDPR makes the consequences even more dire, as noncompliant companies will face fines of up to 4 percent of their worldwide annual turnover.

While all organizations should be focused on ensuring test data privacy in order to reduce the risk of breaches, US companies needing to comply with the GDPR simply don’t have a choice; they must adjust. Starting a test data privacy project may seem like a daunting task, but it doesn’t have to be. Here are some tips for getting started.

Take Inventory of Sensitive Data—and Know What Can Be Kept

The first step is to take inventory of all sensitive data by creating and identifying the columns of information that will need to be disguised. Contrary to popular belief, this does not include names; in fact, eliminating names can make it unduly hard to identify customer records as data moves across transaction paths and platforms in the testing process. For example, in the basic encryption model, an input of “Marcin Grabinski” may deliver an output of “LR/TVdWcXniHAdoN0zhLEw.” Upon brief glance, this is indistinguishable to testers unless the process is automated, and most testing continues to be manual.

The goal of test data privacy is not to disguise data itself, but to make it reasonably difficult to identify individuals—a concept known as pseudonymism. It’s OK to use real customer names from the production database, as long as these names are not linked to home addresses, dates of birth, passports, license numbers, or any other identifying information. Keeping real, easily recognizable names makes testing processes more rapid, efficient, and accurate for manual testers as they track application execution in the testing environment.

Decide on a Disguise Rule

Once companies determine what information needs to be masked, the next step is to create a disguise rule for each of the types of sensitive data. There are various techniques, the best known being encryption, or the process of encoding messages or information so only authorized parties can read them.

The challenge with standard encryption, once again, is that it can make it very difficult for testers to identify what type of information he or she is viewing. Take the example on the previous page, “LR/TVdWcXniHAdoN0zhLEw.” Not only is this very difficult to simply view and recall as a name, but one can’t even tell if it is a name—or an address or a phone number, for that matter.

In the context of test data privacy, another disguise rule, known as format-preserving encryption, tends to work better. Format-preserving encryption keeps the original format of input data while masking it, thus making it more useful for data testing purposes. An example is phone numbers in the UK, which may be reflected as beginning with +44 (so the tester knows he or she is looking at a phone number), but the rest of the data is encrypted.

While format-preserving encryption can work great for data like phone numbers, it doesn’t always work well for data that doesn’t follow a standard format, like addresses. This is especially true for organizations dealing with the GDPR, considering addresses are formatted differently in different countries across the EU. Data translation, which uses existing values stored within files as replacements for sensitive data values, is a good option for organizations looking to mask address information that follows a uniform format.

Create Lookup Tables

If realistic names or addresses must be used, there is no magic to ease the process; data lookup tables must be set up. However, this need not be an overly time-sensitive or laborious process.

First, there doesn’t need to be a huge volume of data records. Some organizations think that in order for testing to be comprehensive, they need to test as many rows as a production contact table contains. This is not true. It is perfectly acceptable to test only 1 percent to 5 percent of data records.

The same ratio is good for addresses—a huge volume is not required. As mentioned above, masking addresses can be challenging due to the wide variety in formats, and inconsistencies between items like zip codes and cities and streets can render records invalid for testing. But there are ways to get around this. Some organizations opt to keep true address information as long as other personally identifiable information attached to it is properly masked, and some even create “substitute” addresses for customers. For example, a UK bank once used addresses from its four hundred UK branches to avoid having any private addresses in the testing environment.

Start Protecting Customer Data Now

The key for test data privacy is striking a critical balance: fulfilling testers’ needs for efficiency, speed, and the most accurate representations of data and application behavior in the production environment, while ensuring privacy and protecting testers from unintentional hazards.

The need for test data privacy isn’t anything new; rather, recent mandates like the GDPR have simply elevated it once again to the forefront. While there are additional steps, those outlined above are a great guideline for organizations looking to get their test data privacy projects off the ground and protect sensitive customer data across all platforms.

User Comments

sharath chandra's picture

A very well-written article. Many companies use real customer data for testing, and improper handling of the data could lead to catastrophic consequences. As an L&D professional, I believe that proper training goes a long way in preventing sensitive data breaches. Managers responsible for providing customer data to testing teams must be educated about the consequences of the data falling into wrong hands. The managers must also be trained to determine what information can be provided and what cannot. Needless to say, people who understand the importance of data security and know how to handle confidential information make few mistakes.  

January 20, 2017 - 2:28am
Marcin Grabinski's picture

Thank you for your insight. Good point - testers usually want to do their job and they don't care so much about the origins of test data, as long as it enables them to build their scenarios. It's Test Management's responsibility to think twice before they authorize that "quick data copy from production, just for a one day.."


January 20, 2017 - 3:11am
Alisha Henderson's picture

Hello buddy,

Nice Sharing..! I have been following you for a couple of months now but this is my first time commenting on a blog post. Thank you for sharing your knowledge and experience with us. Keep up the good work. Already bookmarked for future reference.



March 21, 2018 - 9:32am

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.