Data analytics can provide far more accurate prediction rates than traditional polling and sampling models. Furthermore, applying business intelligence methods can even influence outcomes, as learned from the UK’s “Brexit” vote and the 2016 US presidential election, through identifying voter groups and targeting them with particularly crafted ads on social media. In the background of all that, there’s the complex job of gathering, formatting, categorizing, and storing psychological profile information from many sources—tweets and retweets, Facebook likes, and online games and tests, to name a few.
But this is an upscale example of business intelligence. Let’s step back and look at more common examples.
Suppose your company allows its employees to claim expenses on coffee and snacks. Obvious bookkeeping information would include monthly or yearly total amounts. But is there anything else you could learn?
The first step would be to extract additional data from receipts, such as the times of transactions and names of the places visited. Information about names can be transformed into geolocation models, with all records loaded in a table or database. These steps—extract, transform, and load, commonly abbreviated as ETL—provide the source of data to analyze.
So, how do you then analyze that data? You could visualize the visits on a map and learn about spending per location. If you gathered information about purchased items, you may find out what coffee and snacks are the most popular. You also can learn about places that your employees visit in groups and during working hours.
The results of data analytics may prompt some informed decision-making, and that is the basis of business intelligence (BI). For example, you may decide to help your employees be more productive by reducing visits to a coffee shop by purchasing a coffee machine with their favorite kinds of coffee, which will greatly reduce person-hours spent away from the job. (By the way, that indeed was the reason for installing coffee machines in offices; alas, they often do not offer coffee that people like to get!)
Nowadays, there are open-source BI tools and enterprise BI platforms. As a tester, you’ll be less likely to get involved in the data analytics process, but there’s a good chance you can provide help with the ETL part. Remember, data may come from a variety of sources and in a variety formats. Those extraction and transformation scripts are typically created in-house based on an incomplete or narrow understanding of which data points are important and how they should be represented. Skilled testing of the scripts and evaluation of data ensures higher quality of the representation.
During the loading stage, data should be placed into a destination point (database or file) according to desired rules: form records (fields and indexes), replacing existing records or expanding sample size, and so on. This is implemented through customization and in-house scripts.
In terms of risks, we want to avoid the following, by category:
- Extraction: Erroneously coded criteria or erroneous implementation of queries, as well as risky outcomes of mishandling missing or corrupt data
- Transformation: Unrecognized data formats or erroneous implementation of transformation rules
- Loading: Misunderstanding or misimplementation of loading rules
Referring to our example with coffee receipts, items and transaction amounts may come in different places and record differently after scanning. Some items, such as tip amount, may be of no interest and should be excluded, although it can be worded as “tip,” “gratuity,” or “fee,” adding more to the challenge. Likewise, the amount can be stored as “$2.50,” “2-50,” or even “2 50 USD.”
In my real practice of testing a BI implementation, in a mutual funds and investments enterprise, we had to deal with transaction records consisting of hundreds of fields, and the amounts came in even crazier formats.
This should give the idea that seemingly straightforward business rules require very sophisticated programming in order to effectively mine data for business intelligence decisions.
My first challenge as a tester was the volume and variety of data in the database. Yes, I could run my own queries to try to replicate the developers’ implementation, but I didn’t feel that comparing the work of one program with another would be a good proof that “it works.” Testing in general is not very good at proving absence of errors, but skilled testing is good at proving their presence. So I let database analysts write sophisticated queries and focused on what I’m good at: identifying risks and exposing them.
My first technique was isolation. We agreed to have a logical breakdown of modules and create a mock environment where I could test, in isolation, operations like querying range of data, parsing data records, formatting them, and so on. That allowed to identify certain issues such as logical errors, accidental errors (such as inclusive versus noninclusive boundaries), and, more importantly, gaps in logic and unconsidered cases. For better traceability, the production version of scripts was supplied with a logging mechanism.
My second successful technique became injection of samples into a test database. Having access to implementation of business rules, I had a certain understanding of what kinds of transaction data they may and may not handle. Instead of running extraction-transformation scripts on production data sets in hopes of stumbling upon an error, I created control data groups, such as “common valid,” “uncommon valid,” “common invalid,” and “uncommon invalid,” to feed into the scripts. This helped in proofing the “abilities” of the scripts to handle common cases and report unrecognized or unhandled cases.
After a few cycles, that helped improve the robustness of extraction-transformation scripts, and the quality of data mining on production data sets became satisfactory for business intelligence analysts.
Business intelligence testing is a bright label. In practice, as I learned, be prepared for a lot of data and database analysis. Your testing power might be enhanced by the abilities to read extraction-transformation code (which may vary based on the programming language used by the team) and write some code of your own for mock testing and SQL queries. But even traditional black-box techniques, such as isolation and injection, have proven to be very effective with business intelligence testing.