Using Computer Vision to Reduce Test Automation Blind Spots

The standard test automation toolkit easily completes web and mobile automation, but it fails to detect elements on desktop and mobile content-based applications. Computer vision (CV) replicates the human eye using deep learning technology and can determine objects in pictures, which helps machines orient in space and perform repetitive detection tasks. Let's see how CV can help automate the testing of a much wider software product list.

Artificial intelligence and machine learning are trending in the IT industry, as they perform complicated tasks while excluding the human factor.

The adoption of AI and ML in the QA sphere is growing, too. Developing automated tests, analyzing reports, and automatically fixing tests after changes in UI can highly streamline testing and save time.

Computer vision (CV) is a part of computer science closely related to AI and ML. It replicates the human eye using deep learning technology and can determine objects in pictures, which helps machines orient in space and perform repetitive detection tasks.

Some test automation issues cannot be resolved using an ordinary toolkit. This is where CV can serve the right purpose. Let’s look at the cases where CV can enrich the test automation process.

Mobile Automation

The most common tool for mobile automation is Appium. A single Appium framework is suitable for regular mobile software products, but it fails to detect the elements in content-based apps, such as mobile games. While human eyes can detect all buttons and fields, Appium can only recognize the black screen.

Introducing CV to Appium mobile automation solves this problem. You can train your CV algorithm to recognize the elements of the input data via a screenshot, get their coordinates, and use Appium to interact with them, and you are good to go.

Desktop Automation

Testing desktop applications can be challenging regarding their automation. Selenium WebDriver works only with web apps, whereas for desktop ones, you can use AutoIT for Windows or AppiumForMac for OSX software products.

However, if your application has been developed to cater to different systems, writing multiple scripts with multiple technologies is time-consuming. To overcome this challenge, SikuliX can automate anything you see on your desktop computer running Windows, MacOS, or some Linux or Unix. Powered by OpenCV library, this tool allows you to click on the elements based on how they look on the screenshot and compare the actual result with the desired outcome.

Automation of a Non-Standard Device

Usually, the UI test automation scope is limited to web, mobile, and desktop applications. These options have powerful drivers and tools to identify elements and operate on them.

However, there are some non-standard platforms we may also want to automate, such as game consoles, additional screens for devices, and internet of things technologies. Whenever you are unable to retrieve the state and location of the elements with the help of the standard toolkit, CV can create the element list based on the screenshot or photo of the display provided. Investigate how you can interact with the areas of the screen using coordinates and how this can resolve the testing challenge.

UI Change Detection

Some applications’ UIs rely on a set of predefined components known as a UI constructor, which simplifies the development process, making the design more consistent. At the same time, this approach adds a dependency: You need to track the common components’ versions and make sure your app uses the most relevant ones. If the application is complex, it can be time-consuming to test all the components on the pages manually and detect all the issues and updates.

This is where CV can help. Introduce the image checks to your automation process by uploading the sample page UI version, and compare the actual page look to this template, having your tests fail if the UI does not match. This will help you track changes in the design and with content updates.

These checks need not be performed during every test execution; a true/false flag can configure whether this test run considers the UI change as a bug or not.

Computer Vision Tools for Automation

OpenCV is an open source library intended to process images that can be used in multiple languages, including Java, Python, and Ruby.

This library identifies the elements on the image and interprets them as an element tree suitable for test automation. After the areas of interaction are detected, engineers can get their coordinates and operate on them using Appium or Selenium WebDriver.

SikuliX is a tool for Mac, Windows, and Linux designed to search for a particular element on the screen based on an image. It is developed using OpenCV algorithms and supports multiple languages. It also has Sikuli IDE, which is easy to use and does not require in-depth automation knowledge.

This tool can help automate desktop applications and flash web applications or mobile games when paired with Selenium WebDriver or Appium. SikuliX searches for the area on the screen and operates on the element found.

Tesseract OCR is a program for text recognition and is developed for Mac, Windows, and Linux.

With the help of neural networks, Tesseract OCR recognizes the text on the image and can be extended to recognize custom fonts and text sizes. Introducing Tesseract OCR to web automation makes validating outcomes possible when the result of an operation is displayed on an image instead of being, for example, a text field. It can also be embedded into SikuliX to validate the expected result when using image-based UI automation.

Is Computer Vision a Unified Solution?

It is quite complicated to introduce CV to test automation projects. CV needs additional research, data sets to train the algorithms, multiple tool dependencies, specialized testing skills, and strong monitoring of the results at the first stage. Besides, CV algorithms are not 100% accurate, especially when the image quality is low.

CV is a new tool in software development and test automation, and there are some cases where you should think twice before applying this technology to your testing project:

  • If you have web and mobile applications with accessible elements and can write stable locators for them, implementing CV may be too challenging, fruitless, and expensive compared to the regular web and mobile UI automation
  • If your application has a frequently changing UI and the components of your application look different at the end of each sprint, CV may be the wrong choice—you will see thousands of failed tests, spend lots of time uploading new versions of the design to the scripts, and will not get the desired outcome
  • If image quality is low, you will get inaccurate results

Implementing CV requires additional effort, customization, and engineer expertise. However, it can be helpful when you need to work with image-based content, get text from image elements, and detect changes in your UI. CV can be a good way to streamline your test automation project and resolve specific QA tasks.

About the author

StickyMinds is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.