TrainingConferencesAbout UsContact UsAdvertiseSQE.com

StickyMinds.com: brain food for building better software

Join

Join

Clarify Your Search Criteria
Tips on Using Our Search Feature(s)
StickyMinds.com Home
ResourcesEventsTopicsPowerPassJobs
Software Testing & QA Online Community  >  Detail: Synthesize Your Test Data



A StickyMinds.com Original
Article Picture
Synthesize Your Test Data

By Danny R. Faught

Send This Content to a FriendGet a Short Link to This ContentPrint This ContentSee User Comments About This Content

Summary: In a society growing ever conscious of the benefits of organic materials, "synthetic" is a dirty word. But in this week's column, Danny R. Faught argues that when you're designing tests, synthetic data is the way to go. Read on to learn why it's important to be the master of your data.


Tricentis
I'm going to let you in on a secret. When I interview someone for a software testing job, I have one weed-out question that lets me know quickly whether someone understands the basics of testing:

"Tell me how you would test the operating system's feature that lists files on the hard drive. Choose the utility you're most familiar with to test, such as Finder, Explorer, ls, or dir."

Unfortunately, the difference between a good answer and one that leads me to send a résumé to the shredder is not an answer that you're likely to see taught explicitly in a book or a course. Here's the beginning of an all-too-frequently-heard bad answer: "Well, I would look around on the disk to see what files I can use to test with." Can you see what's wrong with that answer?

Good testers know that they need to control their test data; they don't limit themselves to what happens to be lying around on the disk. They use disciplined test design techniques, which usually means that they create the test data for their tests. They never would skip tests simply because they can't find the right data. Sometimes the only oracle that can tell if the tests pass is knowledge of how the data was created.

Check Your Timing
The kind of test data I'm talking about here is the input that's put in place before starting a test, as a part of the test setup. This may be the contents of a text field that a Web browser automatically populates with previously given information, a file that's loaded into a word processor, or the contents of a database. When you design tests that need to have test data set up in advance, you need to define the procedure for putting it there, using one of these approaches:
  • On the fly—You create the data right before running each test or suite of tests, every time you run them. You can investigate how long it takes to set up the data, how hard the setup procedure is to automate, and how feasible it is to set up and tear down the data repeatedly. For a word processing file this could be as simple as copying the file, but for a database the task may be complex.
  • Created in advance—You may set up the test data once and leave it in place for days or months at a time, especially if it's difficult to set up. In this case, consider how you regularly will check that the data is still valid. What do you do if a test modifies the data in a way that will cause other tests to fail?
In both scenarios, think about any problems that might arise if two people are using the same data source.

The "created in advance" option is the most common because it's easier, especially if you're not validating the test data. But "on the fly" tends to be much more robust if you can find a feasible way to do it.

Test Data Mechanics
You may have more than one way of building test data, either through the application’s user interface or by directly creating a file or database. If you work through the user interface—entering data as the user would—you have the advantage of getting additional test coverage while you're building the data. However, this process may be so time consuming that it's not practical for building a large volume of data. Also, some kinds of relationships among the data may be impractical to set up this way, such as a series of transaction dates spread out over a year.

Bypassing the application when you build your data gives you tremendous power for synthesizing a broad range of test scenarios. But you also might have tremendous difficulty figuring out how to piece together a valid data set or, in the case of a negative test, a data set with only the error that you intended to inject. There might not be any documentation on the internal format of the data that you're working with. Also, if you report a bug using synthetic data and you don't know how to reproduce it using data created through the application, you might have trouble getting anyone to pay attention to the bug, especially if robustness isn't an important attribute of the application.

Often a hybrid approach is best. When you're working with a database, you can start with a snapshot of production data, if available, and then layer your test data on top of it. Be prepared to insert your test data each time you get a new copy of the production data. If you have any specific requirements on the contents of the database, set this up in your test data; don't assume that the production data will satisfy your needs. If you're creating the data in a file with a specific format, you first can create a valid file from within the application and then edit the contents of the file directly to suit your needs.

The Challenges Are Real
You are likely to encounter some frustrating challenges when you try to synthesize test data. For example, if you want to create or modify a test database, you may find that only database administrators (DBA) are allowed to do this, and you may have difficulty convincing an overworked DBA to use his powers to help you. Even if you do have access to the database resources you need, you may not have the skills required to get the job done. Whatever the challenges, they are rarely the result of one bad management decision, but rather a complex web of limitations ranging from poor testability of a legacy application to a lack of human or machine resources.

The sad reality is that some test teams have given up on creating the data that would enable them to implement adequate test designs. Instead, the habit of testing only with production data has become thoroughly ingrained into the culture.

Tips
I'll leave you with a few ideas to help you with your quest in building solid test data.
  • Master several test design techniques so you know what kind of test data you need and can explain clearly why you need it.
  • Use a reliable setup procedure for the data so you reduce headaches caused by corrupted test data.
  • Inform your management of the tests that you want to run but can't run because of limited control over the test data. This will help managers determine how to allocate resources to manage the risk of inadequate testing.
  • Try to gain small victories toward removing roadblocks that make it difficult to synthesize test data. Start with a small step that's not difficult to achieve, and be patient as you continue to improve your organization's test design capabilities.
  • Learn the technical skills that will enable you to synthesize test data, such as: SQL and database administration, how to use test data generator tools, how to program so you can build your own data generators, and how to navigate your operating system so you can find the best way to access the test data.
Going from an abstract test design technique to a suite of tests that you actually can run against your application takes creativity and determination. Your software's well-being is at stake, so don't shy away from the challenge.

Further reading


About the Author
Danny R. Faught prefers organics on the dinner table but synthetics in the test lab. He has been testing software since 1992, and his independent consulting practice is Tejas Software Consulting. Danny invites you to stay in touch by subscribing to the Tejas Software Consulting Newsletter using the form on (www.tejasconsulting.com).

Back to Top
 

StickyMinds.com Weekly Column From 7/23/2007 

Member Comments
Add Your CommentExpand Comments
 
Comment:    
by Ken Taylor 5/21/2008

Thanks for the tips on how to advocate the use of synthesized data for testing.

In the article, you say, "if you report a bug using synthetic data ... you might have trouble getting anyone to pay attention." I have found this to be a common obstacle and to this point, I suggest that test data only be created from scratch when it does not exist in the production data.

In order to create synthesized data, you need to know all the attributes it should have to cover all test cases. So, once you have all the specifications for your data, instead of creating all the data from scratch, try to locate production data that meets...Read On

Author's Response:
5/22/2008    
Hi, Ken. It's important to understand the dynamics of your organization, and it sounds like you're working in a culture where you've recognized that synthetic data is demonized. I think that's unfortunate, because starting with production data can limit the scope of our thinking about test coverage.

What I wrote about bug reporting was ambiguous. What I should have said is that reporting a bug found using synthetic data created by bypassing the user interface can raise legitimate concerns about whether that data could ever actually be created in real usage. Creating test data though the user interface, without using any back doors, though sometimes more difficult to do, proves that the user could actually encounter the failure condition you're describing.

The argument about synthetic vs. production data is a different issue, and a complex one at that.

 
 
Comment:    
by Audrey Hudson 5/21/2008

Hi Danny, a very interesting article. I am in the position of having had to create test data for an area where I have had steep learning curves and many difficulties but I would like to assure my fellow readers that it is well worth the effort.
I am in charge of testing a data warehouse project that produces financial reports. Our front end systems and databases are very complex and, as it is financial reports we are producing, we need data that covers a wide range of dates.
How we have resolved the issue is to have a special 'time-travel' environment produced for us and we use automation to put 'usage and date shifts' through our...Read On

Author's Response:
5/22/2008    
Audrey, you can be an inspiration to people who are testing complex data warehouse systems. Often the testers will encounter significant roadblocks in setting up effective tests because managers don't understand how important it is to test these systems. The result of short-changing the testing is heartache, delays, and waste.

 
 
Comment:    
by Fionna O''Sullivan 5/21/2008

I was pointed here from the most recent StickyLetter, great article! It continually surprises me that so little emphasis is put on test data and its generation, evaluating what data will be needed and deciding how it will be generated is usually the very first thing I do when assigned to a new project.

One other advantage to using synthetic data instead of real data is that you build up from simple cases to complex production-like cases as the product matures - starting with production data immediately in my experience leads to so many failures that you spend the next couple of weeks isolating them, which will probably be done...Read On

Author's Response:
5/22/2008    
Fionna, that's a great point about being able to control the complexity of the tests when we control the test data. We can get a much earlier understanding of how well the system reacts to simple cases when we can limit the early tests to simple inputs. Getting hit with numerous corner cases right off the bat can be overwhelming, and delay our understanding of the state of the software.

 
 
Comment:    
by jennifer orji 8/13/2007

Hi Danny,
Good article - its an area to get other members of the Project Team to understand that we Testers do have to "fudge" the data and create the ugly scenarios that can happen in live and ensure your client doesnt face them. Knowing Scientific method - buy the way wasnt taught on my Software Engineering course - but I learnt this while studying Chemical Engineering, is a great help in understanding and proving that you know what you know to be true.

Author's Response:
5/22/2008    
Jennifer, thanks for the response. I think that test data design is critical, and if we call it "fudging" the data, I hope that doesn't minimize its importance.

I got no exposure to the scientific method in my Computer Science curriculum or any of the hard science courses I took. It seems to be getting less exposure, and perhaps teachers are assuming that it's already well-integrated into all of their materials. But I think that many software testers need a better understanding of it. I'll explore how I might do training in bug forensics.

 
 
Comment:    
by Linda Hayes 7/27/2007

I'm glad you are shedding more light on this subject, Danny. Many people think it only matters if you are automating your tests, but data is every bit as challenging for manual testers! Of course automation tools can help with the setup in either case.

Author's Response:
7/27/2007    
Very good point, Linda. If we're testing manually, we can do tool-assisted manual testing (and I do this often). But you're right that the test data design is important for all modes of testing.

 
 
Comment:    
by Derek Kozikowski 7/25/2007

Thanks for reminding us about one of the key tools in our testing arsenal. Another useful technique I have used is to create baseline databases for the product. These may have production data, as you've suggested, or predefined sets of test data. These baseline databases are then backed up and can be restored on demand. This is especially useful when doing release upgrade testing.

Author's Response:
7/27/2007    
That's great Derek - databases give us the power to easily switch between different data sets, if we'll only take advantage of that power. We have to be careful with production data, though, since there may be personal or financial information that has to be scrubbed. There have been cases where test teams used sensitive data carelessly.

 
 
Comment:    
by Steve Collins 7/24/2007

Excellent article

As I put it to my team when I took it over..

If you can't control the data that your tests execute against then how can you predict the results of your tests?

I have found that there is no tool more useful for Test Data generation than Excel (I'm sure other spreadsheets work just as well, I've just never used them) because of the ability to write Macro's. You can list the data in easily readable columns and then have a Macro convert that data to whatever format you need.

Currently I have 2 kinds of test data spreadsheets that I am using. The first is to seed Historical data into...Read On

Author's Response:
7/27/2007    
Thanks, Steve. I like to write Perl scripts to generate test data, though I've started wishing I had better spreadsheet skills, because you can easily handle some of the simpler data generation tasks in a spreadsheet without doing any programming. Once the data is complex enough to need some programming, you have a wide variety of choices, including the language built in to your spreadsheet. If you use a standalone scripting language, you can move data in and out of a spreadsheet for jobs that the spreadsheet is better for.

 
 
Comment:    
by vijay shinde 7/24/2007

Good analysis Danny, I was running in the same problem some years ago when I was unable to create robust data for my test scenario because of its complexity. But now I am using my own techniques and experience for this and able to create a well robust data set. Usually I save all the test data for each and every module of application while I test that part which builds comprehensive test data. I goes on adding the latest layer to it for my current task in hand which reduces my time to prepare test data if I come to a big module or the whole application test in one shot.

Thanks again for the good tips.Read On

Author's Response:
7/27/2007    
Hi, Vijay. Yes, certainly make sure to save your test data, or even better, a script that generates the data.

 
 
Comment:    
by Maura Shortridge 7/23/2007

Danny, I always love your articles. It amazes me how many "testers" out there do not have a firm grasp of the scientific method, and cannot construct a valid test for even a simple scenario. Additionally, I see way too many "testers" who test solely against requirements, and try to prove the system works, rather than looking for weak points and devising tests that will break the system. The next time I'm conducting interviews, I plan to work your question into the screening portion. They'll get bonus points if they recognize who I stole it from!

Author's Response:
7/27/2007    
Hi, Maura. It was amazing that I got a Computer Science degree without ever discussing the scientific method at school. Some things we have to learn on our own.

Even better bonus points for your job candidates would be to ask if they recognize any author at all who writes about testing. Many testers have made no effort to gather knowledge about their craft outside of what their company feeds them.

 
Back to Top



 
Ads By Google
What's This?
 
 



About Us   |   Contact Us   |   Terms & Conditions   |   Privacy Policy   |   RSS Feed



© 2013 StickyMinds.com. All rights reserved.
PNSQC

Tricentis



STARWEST