Chapter 4. Costs and Benefits of AI-Driven Testing

Throughout my career, I’ve worn quite a few different hats on AI-driven testing projects, so I’m going to share a bit about the costs associated with these types of projects. As a test architect, I worked on a team developing a homegrown, internal tool for AI-driven test automation. The project had a lofty goal to reduce the manual effort associated with testing by 25%. Notable AI-driven features of the tool included automatic exploration and visual state modeling of the application under test; test generation using well-known techniques such as boundary testing and equivalence partitioning; goal-based testing and coverage analysis based on model-based testing criteria; and the extraction of test information from user stories. Later, as an engineering director, I was the key client stakeholder responsible for the testing teams that would be consuming this solution on a daily business. Currently, in my role as a chief scientist, I lead research and development of a commercial AI-driven test automation platform.

In this chapter, you’ll learn some of the benefits of AI-driven test automation in practice. I’ll then give you a holistic view of what the cost-benefit analysis looks like from a business perspective. Although the information at this level does not provide you with specific numbers for investment costs and savings, it should give you a solid foundation for understanding the key factors associated with the return on investment (ROI) of AI-driven test automation.

Investment Costs

Based on my experiences in AI-driven testing projects, I’ll present a well-rounded view of AI-driven testing investment costs, including the up-front and recurring costs you can expect if developing and maintaining a homegrown solution and what’s involved in purchasing a commercial, vendor-based solution.

Homegrown Solutions

If you decide to build an AI-driven test automation platform or framework in-house, you’ll first need the talent to do so. Since labor tends to be the highest recurring cost in software development, you’ll want to pay close attention to these requirements before you go any further down this path. You’ll need a team of folks who have deep knowledge and experience in software testing, software engineering, machine learning, research, and IT operations. Ideally, you’ll want individuals who have a cross section of these skills. You’ll also need a strong engineering leader who can drive the vision, track the project, and work with key stakeholders in the organization throughout development, alpha and beta testing, and adoption.

As you can imagine, the talent required for this type of project is difficult to find and does not come cheap. When I was part of such a team, there were approximately 10 people, 3 of whom had doctoral degrees in computer science focused on software testing and the rest with extensive experience in computer vision, natural language processing, and AI-related disciplines. Those who didn’t have knowledge or background in a given area generally spent a portion of their time every week doing self-paced learning or getting nanodegrees to bring themselves up to speed. Fortunately, some of the talent already existed in the organization, but it was also necessary to recruit for some of the expert positions. Labor cost for the relatively small team was over $2M per year, and it was two years before the first alpha was released. Much of the initial time was spent on gathering and curating training data, performing research tasks, and iterating between research and product development.

Up-front costs included capital expenses related to the project’s hardware and software requirements, the highest of which was deep learning servers for training ML models locally. Alternatively, you could use on-demand, CPU, and GPU cloud compute services. However, with so much initial research and experimentation, investing in the servers up front will typically be cheaper in the long run.

After the initial rollout of your homegrown solution, you’ll still have to make significant investments in its development, evolution, and maintenance. As a result, you are likely to always have an engineering team supporting it, as well as a research team to investigate how the latest advances in AI/ML can help further the benefits of the framework or platform.

Vendor-Based Solutions

Over the past decade, there has been significant growth in the number of vendors offering AI-powered testing solutions. Figure 4-1 provides a snapshot of some of the vendors in this space. Which vendor or solution you go with really depends on your needs, level of investment, and expectations for the ROI. I always recommend starting your search with a set of criteria that describes the capabilities and partnership values that are important to you. Yes, “partnership.” In my experience, even if an available solution appears to fit your needs off the shelf, since many of these tools and frameworks are still in their infancy, you will want to ensure you’re choosing a good partner to guide your organization through integration and adoption.

Figure 4-1. Vendors that offer AI-driven automated testing solutions

If you do end up going with a vendor-based solution, the most obvious cost is a software license fee. Depending on the product, this may be a one-time cost, but it is more likely to take the form of a recurring software subscription. Furthermore, if you think that your organization has unique test automation challenges that AI can solve, set aside some of your budget for professional services and engage the vendor in developing custom modules for you. This is yet another reason why it is so important to select the right partner. Remember, what you’re investing in here is more than the solution; you’re also making an investment in the vendor’s ability to support, maintain, and evolve it. While this may be true for many of your vendor engagements, it is especially true in the rapidly changing world of AI-driven test automation.

ROI

Regardless of whether you build or buy, prior to investing in this technology you’ll want to understand its ROI. So what are some reasonable and realistic benefits from implementing AI-driven test automation? Let’s discuss some of the practical benefits you can expect to see and then take a look at the big picture of how the investment can positively impact your organization’s total cost of testing.

Practical Benefits

Automated testing with AI brings several practical benefits to development teams, including increased coverage, acceleration, reuse, scalability, robustness, and resiliency. These shouldn’t be completely new to you since I’ve already touched on some of these when describing AI for testing approaches in Chapters 2 and 3. However, here I’ll expand on each benefit, adding a bit more detail and context on the ROI that organizations typically realize when applying AI-driven test automation in practice.

Increased Coverage and Acceleration

Grand testing challenges such as input and state explosion are some of the key reasons why coverage is a central theme in software testing. When new features are added to an application, its complexity increases exponentially due to the interactions of these new features with existing components.1 However, traditional approaches to test automation involve adding handcrafted test cases one at a time. As shown in Figure 4-2, over time the test coverage required to validate the quality of your software product diverges from the engineering team’s ability to design and write test scripts for it.

Figure 4-2. Increasing test coverage with AI-driven test automation

Automated testing with AI can increase both the level of test coverage and the speed of testing. As shown in Figure 4-2, accelerated coverage essentially narrows the coverage gap between software complexity and test automation. Recall from “AI for UI Testing” that you can train bots to generate test inputs and expected outcomes automatically. Furthermore, in Chapter 3, you saw that test scripts for video games can be specified as high-level goals, which the bots then use to autonomously explore and search for bugs while trying to achieve those goals. AI therefore reduces the manual effort for creating tests, allowing you to reach higher levels of coverage. Combining AI-driven test generation with other benefits like reuse, scalability, robustness, and resiliency allows testing to keep pace with continuous delivery and the ever-growing demand for more features. Accelerated test coverage translates into a faster time to market and shorter cycles for reproducing and fixing bugs. This in turn can lead to increases in customer satisfaction, retention, net promoter scores, and sales.

Reuse and Scalability

AI-driven test generation systems work in a very general and generic way.2 As a result, the same test generation approach can often be applied to multiple software applications or components. Furthermore, at the user interface level, leveraging computer vision and goal-based reinforcement learning allows for the same test to be run against multiple applications within a given domain. For example, the retail shopping apps from Amazon, Walmart, and Target can all be tested by a bot following this high-level test case flow: “login,” “add items to cart,” and “checkout.” Recall that these types of reusable, abstract test cases are the same ones that enabled the bots to gather insights during “Application Performance Benchmarking” and “Video Game–Testing Practices”.

With AI generating tests automatically, you just might need an army of bots to run all of your tests. Each of these bots can consume a significant amount of resources, especially if they require deep learning models. The good news is that cloud compute is readily available and accessible. Leveraging a distributed architecture can enable you to efficiently and reliably scale to support the needs of large enterprises.3 All of the major players like Google, Amazon, and Microsoft offer cloud services for training and even autotuning ML models. Outside of ML training, many cloud providers and open source container-orchestration systems have features such as autoscaling, where the infrastructure dynamically adds or removes nodes based on demand and/or utilization. Chances are that your organization is already plugged into one of these ecosystems, and if it’s not, it probably soon will be.

Robustness and Resiliency

AI-driven test automation tends to be more robust and resilient than traditional test automation. Recall from “Limitations of Traditional Approaches” that this is particularly true at the UI level, where conventional tools like Selenium and Appium locate screen elements using a DOM. Robustness in this context has to do with the ability of existing test scripts to withstand UI design changes. Consider an example where the frontend development team just performed a huge visual overhaul of your application. They’ve changed the look and feel of icons and graphics, moved around navigation bars and other widgets, and even introduced an entirely new UI framework. As such, the DOM of your application has drastically changed, and any test based on information from the previous DOM is now failing. I’ve lived this scenario many times and, in some cases, seen it delay or prevent organizations from providing a better user experience for their customers.

Now imagine that those same test scenarios are implemented using AI. First, they’re not tied to the DOM and so aren’t doomed to fail like the other tests. However, there were visual updates to the icons and graphics, and the AI-based tests are based on images. The good thing here is that even if you changed the shopping cart icon, you as a human can still recognize it. That’s because you’ve probably seen hundreds of shopping carts in your lifetime. But guess what? You can train the AI on thousands of shopping cart icons, and, in practice when this is done well, the bots will recognize the updated visual no matter where it is located on the screen. Even if the bots failed to recognize it, the resulting test failure could be an indication that a human user might also experience difficulty recognizing it. After all, it doesn’t look like any of the hundreds or thousands of examples from various applications, possibly including those of your competitors. Do you really want to be the only app design in the app store that users have a hard time understanding? In this regard, not only is the AI-driven automated test more robust, but it also provides more business value to the design and engineering teams because it mimics the way the end user interacts with your application.

If you’re anything like me, you’re probably already trying to find other ways the UI redesign may cause the AI to fail. Suppose the entire flow of the application changes? What if new screens are added or removed, or the icon is replaced by a link or some other type of widget? Surely, then, AI-based scripts will fail just like traditional automation, right? Well, not necessarily. This is where the AI bots’ resiliency kicks in. Although the terms robustness and resiliency are sometimes used interchangeably when it comes to automation, they are slightly different concepts. Whereas robustness has to do with the ability to withstand changes, resiliency is more about adapting to changes. In the case where the entire flow of the application changes, remember that the bots leverage goal-based reinforcement learning to explore and model the application. As a result, if screens or widgets are completely removed or replaced, the bots will still try to complete their goals by trial and error. In the worst case, the bots have to completely rebuild their UI model of the application. However, since this happens automatically, it is likely to be way cheaper than paying highly skilled engineers to fix scripts with broken element locators.

Impact on Total Costs

Now that you have an understanding of the ROI from a practitioner’s standpoint, let’s take a look at what this all means for your organization’s bottom line. More specifically, I’ll answer the following question: how does the integration of AI-driven test automation impact total costs? A quick warning that I will get into a little bit of math here, but I promise to keep it relatively high level and not lead you down a rabbit hole of complex ROI formulas, estimations, or spreadsheets.

Let’s begin with an organization’s investment in software testing. By this I mean that, even without introducing AI-driven automation into the equation, every year an organization allocates a portion of its budget to software testing. This generally consists of both labor and assets. Software-testing labor includes any testers, developers, or business analysts who are designing, executing, or debugging test cases, filing and triaging bugs, and so on. Test assets range from hardware equipment and infrastructure to software licenses for testing tools and frameworks. For illustrative purposes, let T be an abstraction of the testing effort. You can then define the cost of testing as a linear function f(T), represented graphically in Figure 4-3. The steepness or gradient of the slope of f(T) is directly impacted by a constant c, which indicates the investment costs in testing labor and assets. Simply put, the larger your testing investment, the steeper the slope of the cost function and, inversely, the smaller the investment, the gentler its slope.

Figure 4-3. The cost of a testing investment T is a linear function f(T) = cT.

Unfortunately, when it comes to software testing, investment costs are just one side of the story. You see, even if an organization doesn’t make an up-front investment in testing, it usually ends up “paying” for it in other ways. Do you know what I am referring to? It’s the cost associated with not testing or not testing enough. When customers encounter high-severity defects in your product, or several low- to medium-severity defects for that matter, they may perceive the product as being low quality. Poor quality can lead to significant financial loss or even irreparable damage to the organization’s brand and reputation. One of the value propositions of testing and test automation is identifying those issues early, prior to release, allows them to be fixed.

Figure 4-4 visualizes some of the various categories of defects based on whether they were found in development prior to release or postrelease in production, along with any associated costs or risks. These are the categories, starting from left to right in Figure 4-4:

Known issues, won’t fix

Any reported issue found pre- or postrelease that the business has decided to not fix. These issues represent accepted risk.

Internally found and fixed in development

Defects discovered by the engineering team prior to release and that the business has decided to fix.

Internally found in production and fixed

Defects discovered by the engineering team after the release and that the business has taken action to fix.

Customer found and fixed

Defects reported by the client or end user after the release and that the business has taken action to fix.

Undiscovered defects

Issues that have been reported by neither the engineers nor customers, pre- or postrelease. These issues represent unknown risk.

The later the stage of the development cycle in which a defect is found, the more expensive it is to fix. As a result, any defects that escape to production pose a risk to the project. Those which are found in production, either by engineers or customers, may be viewed as cost inefficiencies. In other words, investing more up front in testing to find these defects earlier would potentially save the organization money. Furthermore, accelerating coverage through automation can also result in fewer defects escaping to production, which also reduces risk.

Figure 4-4. Various categories of defects and their associated risks and costs

So does this mean that you should pour all of your budget into testing and new, shiny test automation tools? Of course not. You see, when I was a young, inexperienced, and overzealous testing professional, I thought that testing and quality were nonnegotiable. Now that I’m older and a bit wiser, I’ve learned that quality, too, is negotiable if you want to make smart business decisions. However, to be able to do this in this context, you have to consider both the cost of testing and the cost of not testing at the same time.

In Figure 4-5, the function D(T) represents the cost of not testing, which is essentially the cost of escaped defects. This curve exhibits exponential decay, the opposite of exponential growth, and I’ve placed it on the same graph as f(T) so you can visualize them together. Looking at the graphs, you can see that when f(T) is extremely low, D(T) is high. In other words, when there is little or no investment in the testing effort, the organization incurs significant cost and risks due to escaped defects. However, as T increases, the costs and risks associated with escaped defects drop drastically. This is because when you first start testing a new product or feature, finding the first set of defects happens relatively quickly. However, as more and more testing is done, it can become harder and harder to find that next defect. Eventually, beyond where the two curves intersect, T can get so large that the law of diminishing returns sets in. Now you’re essentially overtesting the product because although you’re investing more in testing, you’re not getting a good return on that investment.

Figure 4-5. The cost of not testing is the cost of escaped defects D(T)

When considering how much to invest in testing and test automation, you need to consider the total cost of testing. Figure 4-6 introduces a new function C(T), the total cost of testing.

Figure 4-6. The total cost of testing C(T) is the investment cost plus the cost of escaped defects

As its name suggests, the total cost of testing C(T) is the sum of the testing investment f(T) and the costs of escaped defects D(T). The result is the blue, parabolic curve. The reason I mention its shape is because ideally you want to find the level of testing investment that minimizes the total cost of testing. In Figure 4-6, that’s the lowest point on the not quite U-shaped parabola. Too little investment prior to this point means the organization is likely experiencing cost inefficiencies, while too much investment beyond it results in a negative ROI.

Now that you have a framework for understanding the key factors that impact your total testing costs, I can use it to finally answer the question set forward at the beginning of this section: how does AI-driven automation specifically impact total costs? Throughout this report, you have seen a recurring theme: AI-driven test automation is bridging the gaps between human-present and machine-driven testing capabilities. The end result is a significant reduction not only in labor costs but also in total costs.

Recall from Figure 4-3 that labor and asset costs indicate the value of the constant c, which determines the steepness of f(T). Since labor costs are almost always your biggest line item, investing in AI-driven test automation, or any process or tool that can act as a force multiplier for that matter, will lower the value of c. Figure 4-7 shows the ripple effect that this will have on the total cost of testing. As expected, reducing the cost of labor and assets produces a gentler slope for the investment cost function f(T). This results in the right side of the U-shaped total cost function C(T) becoming flatter and more open. The minimization point for C(T) moves down and to the right. In other words, when compared with the previous model, any given level of investment prior to reaching that minimal point now allows for a greater testing effort or a lower overall total cost. Furthermore, even if the team were to go beyond that minimal point into overtesting the product, it would still be cheaper to do so with machines rather than people, as indicated by the shaded area in Figure 4-7.

Figure 4-7. Reducing labor via AI-driven automation significantly reduces total testing costs

Conclusion

Determining how much to invest in AI-driven testing automation, or any advanced testing technology, should be based on a careful analysis of its ROI. My hope is that, through this chapter, I’ve provided you with enough insight into the available options and their respective costs and benefits so that you can make a more informed decision for your business. Whether you are an individual contributor or a leader in your organization, being able to formalize and articulate the cost and value of testing can be the difference between thriving and surviving.

1 Jason Arbon, “AI and Machine Learning for Testers Jason Arbon, Appdiff,” 35th Annual Pacific NW Software Quality Conference, October 18, 2017, YouTube video, 44:16.

2 Dionny Santiago, “A Model-Based AI-Driven Test Generation System.”

3 Patrick Alt, “Deploying a Large Scale Army of AI-Driven Testing Bots” (paper presented at the STAREAST 2021, virtual, April 2021).

Get AI-Driven Testing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.