Testing systems that don’t always return the same answer requires new definitions and approaches.
Software testing is a fairly straightforward activity, in theory. For every input, there is a defined and known output. We enter values, make selections, or navigate an application, then compare the actual result with the expected one. If they match, we nod and move on. If they don’t, we possibly have a bug.
Granted, sometimes an output is not well-defined, there is some ambiguity, or you get disagreements about whether a particular result represents a bug or something else. But in general, we already know what the output is supposed to be.
But there is a type of software where having a defined output is no longer the case: machine learning systems.
Most machine learning systems are based on neural networks, or sets of layered algorithms whose variables can be adjusted via a learning process. The learning process involves using known data inputs to create outputs that are then compared with known results. For example, you may have an application that tries to determine an expected commute time based on the weather. The inputs might be temperature, likelihood of precipitation, and date, while your output is commute time for a set distance.
When the algorithms reflect the known results with the desired degree of accuracy, the algebraic coefficients are frozen and production code is generated. Today, this composes much of what we understand as artificial intelligence.
This type of software is becoming increasingly common, as it is used in areas such as e-commerce, public transportation, the automotive industry, finance, and computer networks. It has the potential to make decisions given sufficiently well-defined inputs and goals. To be precise, you need quantitative data. The inputs and expected output have to be able to be mathematically evaluated and manipulated in a series of equations. This could be as simple as network latency as an input, with likelihood of purchase as an output.
In some instances, these applications are characterized as artificial intelligence, in that they seemingly make decisions that were once the purview of a human user or operator.
These types of systems don’t produce an exact result. In fact, sometimes they can produce an obviously incorrect result. But they are extremely useful in a number of situations when data already exist on the relationship between recorded inputs and intended results.
For example, years ago I devised a neural network as a part of an electronic wind sensor. It worked though the wind cooling the electronic sensor based on its precise decrease in temperature at specific speeds and directions. I built a neural network that had three layers of algebraic equations, each with four or five separate equations in individual nodes, computing in parallel. They would use starting variables, then adjust those values based on a comparison between the algorithmic output and the actual answer.
I then trained it. I had more than five hundred data points regarding known wind speed and direction, and the extent to which the sensor cooled. The network I created passed each input into its equations through the multiple layers and produced an answer. At first, the answer from the network probably wasn’t that close to the known correct answer. But the algorithm was able to adjust itself based on the actual answer. After multiple iterations with the training data, the values should gradually home in on accurate and consistent results.
How do you test this? You already know what the answer is supposed to be, because you built the network using the test data, but it will be rare to get a correct answer all the time.