TrainingConferencesAbout UsContact UsAdvertiseSQE.comRSS Feed

StickyMinds.com: brain food for building better software

Log In
 Clarify Your Search Criteria

Tips on Using Our Search Feature(s)
 
StickyMinds.com Home
ResourcesTopicsCommunityPowerPass
Home  >  Detail: Automation or Not, It’s All About the Data



A StickyMinds.com Original
Article Picture
Automation or Not, It’s All About the Data

By Linda Hayes

Send This Content to a FriendGet a Short Link to This ContentPrint This ContentSee User Comments About This Content

Summary: While anyone who has automated her testing knows you can't create repeatable automated tests from unstable data, it did not dawn on this week's columnist--self-proclaimed automation lobbyist Linda Hayes--that this issue cripples manual testing as well. Read on to share her epiphany.


Telelogic North America
 

 
As an automation lobbyist, I constantly whine about test data–-or the lack thereof. It's basically impossible to develop repeatable automated tests without a known, stable data state. For companies that are transitioning from manual to automated testing, realizing this is like stepping into an ice cold shower: it wakes you up in an unpleasant sort of way. 
 
Don't get me wrong, I know it's a huge problem. You can't just go around archiving and refreshing monster databases, and even if you could, there are related files and interfaces between applications that make it even harder. It didn't dawn on me until recently that the real problem has nothing to do with automation at all. 
 
Here's the scenario: I was working with a company who was evaluating test automation. As part of the assessment, one of their QA managers walked me through a test case manually. The test was to issue a loan against a 401(k) plan. First she had to find a plan that permitted loans, as well as a participant within that plan who had a sufficient cash balance for the loan, had not taken out a loan within the past year, and did not have an outstanding loan from a previous year. 
 
This took her about half an hour. Once she found the right plan and participant, it took about ten minutes to issue the loan and confirm it was accepted. Granted, she was explaining things to me as she went along, so without my involvement the whole process would have been faster--but the ratio between locating the data and executing the test case would have been the same. 
 
Next it was time to automate the test case, but as soon as we started she pointed out that we could not use the same participant because it now had a loan outstanding and no longer qualified. So, the whole process had to be repeated.  
 
At this point I concluded that automation was impossible, because we would have to essentially write an artificial intelligence system that knew everything she did in order to find valid accounts. Their environment was not stable enough to either reproduce the same data or even let us add our own data, since it was shared by others and updated constantly.  
 
After discussing the implications with her and her manager, we agreed that automation was not an option unless the data environment was brought under control. This would require a substantial investment in terms of hardware, software, and time. I encouraged management to make a business case by pointing out all the benefits that automation would bring. They agreed to run it up the flagpole but made no promises. 
 
It wasn't until some time later, when I was reflecting on this issue for another account, that it suddenly struck me: automation has nothing to do with it! 
 
Think about it: I watched her spend three-quarters of her time for a manual test just dealing with the data and one-quarter running the test. And she was lucky she could even find an account with all of the conditions necessary; no doubt in some cases a test could not be run simply because the data did not exist. This is especially true for test cases that are specific as to time, for example the posting of interest or dividends that occurs only on a particular time schedule. 
 
So whether she ever automated her testing or not, just providing data stability would improve her manual testing productivity by a factor of 4X! That’s huge.  
 
What I should have been telling management is that they needed to get control of their test data whether they automated or not. Improving manual test productivity, and therefore reducing costs and cycle time, by 75 percent would have no doubt made a compelling business case without introducing automation into the picture at all. 
 
This doesn't mean I am giving up my automation lobbying, because automated tools are an ideal way of loading test data and performing other high volume, mundane tasks needed to create and maintain a test environment. But it does mean that the business case for controlling test data is even more important than I used to think. 
 
What about you? How does your test data environment affect your productivity?

About the Author
Linda G. Hayes is the CTO of Worksoft, Inc., developer of next-generation test automation solutions. She is the founder of three software companies including AutoTester, the first PC-based test automation tool. Linda holds degrees in accounting, tax and law and is a frequent industry speaker and award-winning author on software quality. She has been named as one of Fortune Magazine's People to Watch and one of the Top 40 Under 40 by Dallas Business Journal. She is a regular columnist and contributor to StickyMinds and Better Software magazine, as well as a columnist for Computerworld and Datamation, author of the Automated Testing Handbook and co-editor Dare to be Excellent with Alka Jarvis on best practices in the software industry. Her article "Quality is Everyone’s Business" won a Most Significant Contribution award from the Quality Assurance Institute and was published as part of the Auerbach Systems Development Handbook. She is a regular contributor to Better Software magazine and StickyMinds.com. You can contact Linda at linda@worksoft.com.

Back to Top
 
 


Member Comments
Add Your CommentExpand Comments
 
Comment:    
by Venkat Moncompu 7/21/2008

Hi Linda, Your article and others comments are a treat to read and relate to. I have particularly experienced this problem while setting up a performance test where test data requirements that rely on business rules and integrity constraints can be a problem. And as one of your readers comments a combination of automated script that can either use a stable GUI or an exposed API or database script or data loading mechanism has worked for us. We had to automate the "data creation for test setup" because of the huge volume of data required/necessary to run the long running/duration performance tests. And at times, we had to also be creative to...Read On

Author's Response:
7/21/2008    
Thanks, Venkat. I agree that automation is likely necessary to create any reasonable volume. Interesting that you had to run jobs during the test to keep the data volume manageable.

 
 
Comment:    
by Jim Dougherty 7/7/2005

Linda - There are sometimes ways around situations like this. I am currently doing some automation work for a small company that processes financial documents. The manual test cases I was provided were great for evaluating the state of the various applications, but had a serious data problem. Nothing was every really deleted or gone away if a test case called for deletion of a user or a customer or a group of customers, etc. To work around this, in the instances of a test script where I had to add, delete, or archive an artifact, I used the =TODAY() method to generate names. This works very well in overcoming the duplication...Read On

Author's Response:
7/7/2005    
Hi Jim - aha! I knew that whole retirement thing was just a ruse. Good to hear from you. Yes, there are many creative approaches to this problem, and frankly even a full mirror of production may not be enough since there may be conditions that occur only at certain time or in specific circumstances that may not exist when tests must be run. The key is to have enough control to be able to add data when needed and use it repeatedly.

 
 
Comment:    
by Vladimir Angelov 6/16/2005

This article was like a revelation for me. Currently I have to automate tests for large distributed software systems which consolidate data from many different sources. In addition I have only limited control over the data that is present in the systems. Automating tests in such an environment is a nightmare.

Author's Response:
6/16/2005    
Hi Vladimir - I feel your pain! How do you work around it, if I may ask? You can email me directly at linda@worksoft.com.

 
 
Comment:    
by Frits Bos 6/1/2005

Hi, Linda, Until I read 401(k) I thought you started to describe a business we are both acquainted with. I think that you have identified a common problem, one that I faced with a client where I had to manage the conversion of a large investment portfolio management system to a new platform. The process you described for finding potential test instances is one that I automated using an Excel VBA solution. For up to 50 specific combinations of conditions the macro searches the database and catalogs the references. Based on that list a second macro extracts the data for use in a test database, while another macro does the necessary data...Read On

Author's Response:
6/2/2005    
Hi Frits - always good to hear from you. I agree, this situation sounds familiar because it happens everywhere. You are right, the data is not always there to be found and you need to not only generate it but make sure it is integrated with your test cases!

 
 
Comment:    
by Meredith Otto 5/23/2005

I work as a test analyst for a large company with lots of heirarchical databases still leftover from the green-screen days, and I struggle with this "data gathering" problem on a daily basis. The real problem is that we're trying to create more streamlined web-based applications against the same aging data model. The old, clunky green-screen apps had completely different data needs than our new agile applications, but recreating our data model using a relational DB would take years! Since we don't have any hope of getting a relational DB any time soon, I have been trying to work with our developers to create...Read On

Author's Response:
5/23/2005    
Hi Meredith - good to hear from you, hope all is well. I agree you are in a hurt locker when it comes to data, but it's good you are getting some development help. You might consider a combination of data injection, gathering and data input; automation can be handy for just driving data into the system. By gathering up valid policies, for example, or injecting certain conditions into the back end you could use automation to create various claims or other tranactions through the online system. Good luck!

 
 
Comment:    
by Paul Fowler 5/18/2005

I think you just touched on the data issues. Does the data properly imitate what is going to be in production at the time of deployment (including data from use of previous builds which might have had bugs.) How many times have you had a bug creep by because what was already in production database was different from test? Is there enough data in the correct places for a stress or load test to be valid? As an example, I have seen searches that blazed in test and crawled in production. How do you know if a bug is logic (code) or data related? This gets to reproducability... How do you test the effect of the full domain of data...Read On

Author's Response:
5/18/2005    
Hi Paul - you're exactly right, I only touched on the productivity aspect. Data is an enormous issue. The key is how you make the case to management to invest the time, effort and tools to make it an integral and workable aspect of the test process. That is the point I was trying to make: the cost in lost productivity can be used to justify the cost of getting control - and by control I mean it in every aspect.

 
 
Comment:    
by Brian Colcord 5/18/2005

Seems to me the simple solution to this would have been to have a DBA sitting in on the discussion as well. A simple SQL statement should have been able to identify valid test data that could have been used in the manual test and later the automated test. The valid data could be stored in a file and updated as used.

Author's Response:
5/18/2005    
Hi Brian - in the type of environment most enterprises are dealing with, the term "simple SQL statement" rarely applies. This presumes their data is stored in a relational database, and while that may be true in many cases it is not universal. Even more complex are all the related transaction and other interface files in any number of formats. Further, each individual test condition - and they test thousands - would need a specific query written. Without a dedicated DBA, this is not a practical alternative.

 
 
Comment:    
by Colin Robb 5/18/2005

Testing with existing data will always be a time-consuming task due to incomplete or invalid data. Creating known sets of test data prior to the test will always be the best option if it is feasible. Creating the data via the application UI ensures valid data which satisfies referential integrity rules, but is time-consuming, even when a functional test automation tool is used. As many people will have experienced, test data creation via the "back door", i.e. SQL loads or data imports is much quicker, but is fraught with danger. All too often the resulting datasets break some business rule somewhere and the data is therefore...Read On

Author's Response:
5/18/2005    
Hi Colin - thanks for a practical suggestion. I agree that automation tools are ideal for loading data, but the pesky part is coming up with those data values in the first place. Some combination of data generation tools (and even these require capturing the business rules) and data loading is probably ideal.

 
 
Comment:    
by BALAJI NARAYANAN 5/18/2005

Hi, I have worked in complex test projects (like Banking application) and I have faced the same problem with data set up for test cases (involving some complex and infrequent scenarios). But are you suggesting that productivity (whether manual or automated) would increase by a factor of 4 by taking care of test data. It is no doubt that the test productivity would increase but it will do so by a factor of 4 only when all the test cases are complex and infrequently used scenarios. In my experience, I have seen around 10% of the test cases are like that. So, that would effectively increase the productivity factor only marginally but...Read On

Author's Response:
5/18/2005    
Hi Balaji - I can't say I agree that only 10% of test cases require data to be identified and located; in my experience it is the majority of them. Maybe the difference is that I am thinking about functional test cases that exercise business rules, and you may be thinking of test cases that are more unit level, such as boundary conditions or equivalence classes, etc. The key is the degree of interrelationship between data elements. Just finding an account number that exists is obviously easier than finding a plan with certain charactertistics, then an account within that plan with still more conditions, and so forth. But remember...it's just my opinion. I could be wrong, too! And often have been.

 
 
Comment:    
by Rose Swain 5/17/2005

I once worked for a large health insurer for many years. As you might imagine, many of our test cases were comprised of things happening to individuals and groups. We created our own sets of test data with all the necessary variables, set a copy aside and pulled a copy back whenever we needed to. In my current position as QA manager of an educational assessment company, I find this a bit harder to do to cover all the necessary test scenarios, but not impossible and still worth it. It just takes more tweaking when you want to reuse it. However you handle it, you hit one nail on the head - it's all about the data. Without good test data, your...Read On

Author's Response:
5/17/2005    
Thanks, Rose. Experience is the best teacher, and I agree that if you create the data instead of try to find it you will always be in control.

 
 
Comment:    
by Chris Pousset 5/17/2005

Been there, done that. One automated tool I developed does the looking for suitable candidate records to start with. My first choice would have been to _create_ the data, but because of the complexity of the data required and the legacy systems involved, that approach was a non-starter. Moral of the story--if possible, hang onto or make the data creation tools first. Automating that part can be one of the high leverage items, whether used for manual OR automated testing.

Author's Response:
5/17/2005    
Right, Chris. I'm impressed that you were able to create a tool that looks for the data - that in an of itself can be extremely complex task since it has to incorporate all of the business rules that experts know from experience.

 
 
Comment:    
by Daniel Kelly 5/16/2005

Not intending to be rude, but this aprticular epihany occurred to me about 10 years ago while testing a bank system. The first step was to create a data base with multiple records tht matched the necessary profiles for the test cases we needed to run and then archiving it so that we could restore to the initial state at will. That included records that would succeed and ones that would fail. Sure made life easier! The real question is, what was the tester doing using production data sets for testing! Wasn't privacy and protectoin of the data a concern?

Author's Response:
5/17/2005    
Hi Daniel - nothing rude about it! It's not a new problem, obviously. You're right, privacy is a huge issue now with production data and even though you're not *supposed* to do it it still happens sometimes. In a lot of cases (any maybe in this one) the production data is scrubbed to scramble sensitive data fields which helps but still doesn't create the scenarios needed for testing.

 
 
Comment:    
by Mike Whittaker 5/16/2005

I must have missed something here - surely there are test data generators that can create/inject conforming cases and data sets required for a given type of test ? On a database system you could enclose the test(s) in a transaction and roll it back to return to your original conditions ....

Author's Response:
5/17/2005    
Hi Mike - you're right, there are all kinds of data tools, but the problem is not so simple. Aside from the referential integrity issues, there are many other interrelated files in many different formats that all must correlate. Both internal and external interfaces are involved. Defining and maintaining all the data models, file formats, and related fields across hundreds of applications and dozens (or more) files and databases is a career in and of itself.

 
Back to Top


Marketplace

Online Crash Analysis
Automatically capture customer crash data, no debugger required. Support for .NET, C++, OS X, Java.

Six Sigma Certification
100% Online-Six Sigma Certificate from Villanova - Find Out More Now.

Census: Web-based Bug Tracking and Defect Tracking
Track software bugs, defects, enhancements, support calls, and more. Issue tracking software that is scaleable, fully customizable and integrated with VSS. Includes e-mail notifications, role-based workflow, change history, and Crystal reporting.

Need Agile Test Cases?
Create statistically complete test cases simply and quickly.

Build IT Knowledge with Current & Trusted Content
Helps Employees Develop & Hone New Technical Programming Skills. Sign Up & Get Full Access.

Get your product or service listed here.
Subscribe to Better Software Magazine
Subscribe to Better Software Magazine

First Name:

Last Name:

Email Address:


Home   |   Resources   |   Topics   |   Community   |   PowerPass



© 2008 StickyMinds.com. All rights reserved.
StickyMinds.com is a division of Software Quality Engineering.
Privacy Policy    Terms & Conditions    Link to StickyMinds.com    Feedback


Borland

Software Quality Engineering



STARWEST 2008

 
Agile Development Conference 2008