When it comes to automated testing, there can be challenges to developing consistent and stable tests. One primary factor that makes this a challenge is loading and rendering time and latencies. This plays a larger role for web applications or other network dependent applications, especially with user interface automation.
Many tests will wait for a period before timing out, resulting in a test failure. At this point, no assertions have been made and the test fails with a timeout exception, which simply prints a stack trace with no additional test information. It is likely that the tester will have seen this occur and will simply rerun the test and hope the timeout will not occur again, but this tactic has inherent problems that are combinations of the test environment infrastructure, the performance of the application under test, and the assumptions of the tester.
Whether testing a back-end service or a front-end, network-dependent application, the network goodput will be variable. The service or application is dependent on this, so they will respond with variation as well. If the application or service cannot handle test requests in conjunction with insufficient goodput, then timeouts will occur.
The solution is not to manually trigger retries or built-in retry logic for the test. This should not be looked at a problem with the test, but the test environment or the unit under test. Thus, if these timeout issues are occurring during testing in the nonproduction test environment, the best practice is to create a test environment that has the same infrastructure as the production environment. This means having the same number of servers as production, placing the servers in the same region as what production uses, and utilizing the same protocols and configurations as production.
Making a subpar test environment due to financial concerns could introduce more risk and lower quality when releasing to production. Although the production environment should be thoroughly tested as well, it is not likely to support all test cases due to write operations from tests that would alter production data and metrics, so it cannot be used for the full automated test passes needed in a test environment. To fully gauge the quality and performance required, the test environment should be nothing short of a replication of the production environment.
Once the test environment mirrors the production infrastructure, you are ready to accurately use timeout values in your tests. For example, if using Selenium and waiting for a locator after an action, set the wait timeout to be a value that correlates to the service-level agreement required for load and render time, such as:
WebDriverWait wait = new WebDriverWait(driver, 3);//page should load and //render within 3 seconds per requirements
Assert.fail("Order confirmation page did not meet load time SLA of 3 seconds");
A similar design can be used for back-end service testing, where the test will Assert.fail() if no response is received within a certain number of seconds. At this point, you cannot blame this as an unstable test, as the test infrastructure mimics production. This means that in production, the page can take more than three seconds to load and render, which does not match the requirements. If this occurs, a performance bug should be filed. This can point out either that the production infrastructure should be updated (and thus also the test environment), or that the application or service code should be optimized to respond or load faster.
The timeouts used in the test can now be used to measure the performance of the application and service while testing the functionality. Also, the nonproduction application or service in the test environment don’t need special configurations (such as longer response timeout values) to account for automated testing stability. Because no automated retries or manual retries are needed due to timeout issues, the test execution will be faster, and there can be a faster deployment to production.
Taking this one step further, you can simulate network traffic by customers in production by creating the same amount of load in the test environment while running tests. Still, the timeout values should not be changed. When under heavy load, such as during a sale season on a commerce site, if the page loads in ten seconds, the test will fail and provide an indication that the network infrastructure may need to scale up.
There are multiple benefits to this test design. First, it gives an idea as to whether the service or application infrastructure is scaled correctly. Second, it provides an indication of the code complexity and what can be done to optimize it. The tests also are now both functional and performance tests as single units. Finally, test execution time is reduced due to not relying on retries due to timeouts to only check the functionality.