You have now defined good performance. You can’t just sit back and hope that the development you have done is good enough to hit those targets. A good performance warrior will now start testing to determine that they are being met ahead of pushing to production.
This opens up a whole new set of challenges that could fill a whole book (that book is The Art of Performance Testing by Ian Molyneaux (O’Reilly), an essential read for any performance warrior), so I’ll just present a quick overview of some of the issues to be addressed.
Performance testing tools range from open source to very, very expensive, each of them having their pros and cons. Some rely heavily on scripts and are aimed more at competent developers, whereas others are more drag-and-drop, aimed at less technical people. Some target particular technologies, such as Citrix-based systems.
For general web-based systems, JMeter is a good starting point. It is open source, has a reasonable learning curve, and has built up a good community to go to for support.
The environment that you test on can make a big difference. Obviously, if your live system runs on 10 quad core servers, each with 64 GB of RAM, testing on a single dual core server with 32 GB RAM (or just on your laptop) will not get the same results. It doesn’t invalidate the testing, but you need to scale down your expectations.
Other aspects of your environment beyond hardware also have to be considered. Think also about infrastructure (are you going through the same load balancer, firewalls, switches, bandwidth, etc.?), data quantities (are you testing with 10 products when there are 100,000 in production?), contention (are their other systems that share elements of the system under tests?), and so on.
Often, creating a reasonable performance-testing environment is difficult for logistical or economic reasons, so you may need to think out of the box. Cloud environments are a good option to quickly spin up and down large platforms. Disaster recovery (DR) environments are another option if they can be temporarily used for performance testing.
Some companies actually use their own production environments (after all, what could be more production like?) during periods of low or no usage. Many subtleties and risks have to be considered before doing this, particularly, how you minimize the impact on real users during that period and how you isolate any data created by performance testing from production data.
Performance testing is usually based around replicating user journeys through your system to simulate multiple actions by multiple users simultaneously. It takes effort to simulate these actions in a manner that reflects what will happen on the production system.
Getting these as representative as possible is a complex task, especially for a greenfield project where much is based on conjecture and guesswork. For existing systems, server logs and analytics (e.g., Google Analytics) provide an excellent starting point.
It is important that a wide enough range of representative user journeys is created with sufficient randomization of data to ensure that the system under test is being effectively exercised while not invalidating the test by making it not repeatable.
One of the most complex elements of performance testing is generating the pattern and levels of users to execute the user journeys. This is the load model.
On existing systems, server logs and analytics packages can be a good starting point.
It is essential that this is realistic and you create a valid number of users, acting in a realistic manner with realistic think times between each step of their journey. If your performance targets are focused around hitting a target level of transactions per second, then it is possible to reverse engineer the load model to determine how many users will be required to hit that level.
A reasonable test execution plan will include multiple load models to mimic different patterns of usage such as normal load and peak load.
There are different type of tests, and it is essential when executing a test that you are aware of the type of test that you are running and have defined what you hope to determine from that test.
Some examples of the types of performance test that can be executed are:
All these elements create hurdles that must be overcome, but they are all surmountable. The important message here is that some testing is better than none. Once you are doing some testing, work on a program of continuous improvement and refinement until you are seeing more and more value from the testing you are doing.
During the early stages, you will likely see false positives (i.e., tests that indicate potential performance problems that don’t materialize on production). You also may see performance issues on production that were not caught in testing. This does not invalidate the testing process; it just reveals that you need to understand why the test failure have occurred and evolve your testing process to mitigate that in future.
At each step of improving your testing process, the important questions to ask yourself are:
The traditional role of performance in the development cycle was to get a release signed off through functional testing and complete a performance testing process prior to going live. This approach leads to a some fundamental problems:
The project is generally regarded as finished by this point. Testing at the end of the project inevitably gets squeezed as previous steps overrun their deadlines. This results in:
Performance issues often are caused by underlying architectural issues and are therefore much harder to fix than functional issues.
These two factors combine to create the perfect storm. You are finding bugs that are the hardest to fix, at the time when they are hardest to fix, at a point when people are not inclined to want to spend additional time doing major changes, all to be completed in a period of time that is constantly squeezed by the earlier phases.
Having said all that, why do many companies still insist on testing only at the end of projects? Arguments are usually based around one or more of the following reasons:
On the surface, all of these arguments have validity, but the same claims could be made for any kind of testing. Nevertheless, the agile movement has consistently shown that earlier functional testing results in faster, more reliable development.
The traditional approach of “finish development, go through a load testing process, and approve/reject for go-live” really doesn’t work in a modern development environment. The feedback loop is just too slow.
As performance warriors, you need to be looking at methods to execute performance testing earlier. However, the arguments against executing complete performance tests earlier in the process have a degree of validity, so it is worth considering other methods of validating performance at that stage.
Many problems occur only under heavy loads, and these kinds of testing won’t encounter such conditions, but they may provide some early indications of such problems.
These low-level methods should be augmented with performance testing with higher volumes of traffic as early in the process as possible.
As well as testing earlier, it is also important that the performance engineering team has a lot of integration with the development team, for both practical and political reasons. As discussed in “Assign Someone with Responsibility for Performance Within the Project”, there are several ways of integrating a performance engineer into the team.
When issues are identified, performance engineers and developers must cooperate and share their knowledge to resolve the problem. Performance engineers should not just identify problems; they must be part of the solution.
One of the downsides to pushing performance testing at early stages is that it often results in additional testing without appropriate space for analysis. Analysis of performance testing is important to expose testing’s insights; performance is not based on black-and-white results.
An oft-cited but still forgotten principle is that data is not information. Human intelligence is required to convert data into information. An important role of the performance engineer in improving any process is to ensure that the extra step of creating information is taken. Data is too often the focus of attention because it can be provided more regularly. As a performance warrior, you must ensure sufficient quality, not just quantity, of performance testing during the development process.
Considering running two levels of analysis on performance test results:
It is an often-heard motto within the agile and continuous delivery world: if something is hard to do, do it early and do it often!
The same is true of performance testing. The more often you can test and the less of an special event a performance test becomes, the more likely you are to uncover performance issues in a timely manner.
Cloud and other virtualized environments, as well as automation tools for creating environments (e.g., Chef, Puppet, and CloudFormation), have been game changers to allow earlier and more regular performance testing. Environments can be reliably created on demand. To make testing happen earlier, we must take advantage of these technologies. Obviously you must consider the cost and licencing implications of using on-demand environments.
We can also now automate environment setup, test execution, and the capture of metrics during the test to speed up the analysis process. APM tooling helps in this respect, giving easy access to data about a test run. It also allows the creation of alerts based on target KPIs during a test run.
Once performance testing gets added to the tasks of the test group, the obvious next step for anyone running a CI process is to integrate performance testing into it. Then, with every check-in, you will get a degree of assurance that the performance of your system has not been compromised.
However, there are a number of challenges around this:
Full-scale performance tests need a platform large enough to execute performance tests from and a platform with a realistic scale to execute tests against. This causes issues when running multiple tests simultaneously, as may be the case when running a CI process across multiple projects.
Automation can solve these problems by creating and destroying environments on demand. The repeatability of datasets also needs to be considered as part of the task. Again, automation can be used to get around this problem.
Spinning up environments on demand and destroying them may incur additional costs. Automating this on every check-in can lead to levels of cost that are very difficult to estimate.
Many performance test tools have quite limited licensing terms, based on the number of test controllers allowed, so spinning up multiple controllers on demand will require the purchase of additional licenses. The development team needs to consider these costs, along with the most cost-effective way to execute performance tests in CI.
One solution to this is to use open source tools for your CI performance tests and paid tools for your regular performance tests. The downside is that this requires the maintenance of multiple test script packs, but it does, however, enable you to create simplified, focused testing that is CI specific.
CI is all about getting a very short feedback loop back to the developer. Ideally this is so short that the developer does not feel it is necessary to start working on anything else while waiting for feedback. However, performance tests usually take longer than functional tests (30 minutes is a typical time span); this is increased if they first involve spinning up environments.
A solution for this is to run simplified performance tests with every check-in. Examples include unit tests timings, micro-benchmarks of small elements of functionality, and WebPagetest integrations to validate key page metrics. You can then run the full performance test as part of your nightly build, allowing the results to be analysed in more detail by performance engineers in the morning.
CI typically relies on a very black-and-white view of the world, which is fine for build and functional errors. Either something builds or it doesn’t; either it is functionally correct or it isn’t.
Performance testing is a much more gray area. Good versus bad performance is often a matter of interpretation. For CI, performance testing needs to produce more of a RAG result, with a more common conclusion being that the matter is worthy of some human investigation, rather an actual failure.
Adopting a spectrum of failure over a pass/fail solution requires people to investigate data trends over time.
The performance engineer needs access to historical data, ideally graphical, to determine the impact of previous check-ins to enable the engineer to be able to get the the root cause of the degradation.
Functional testing rarely needs a look back at history. If a functional test previously passed and now fails, the last change can be reasonably blamed. The person who broke the build gets alerted and is required to work on the code until the build is fixed.
A performance issue is not that black and white. If you consider a page load time of 10 seconds or more to be a failure, and the test fails, the previous check-in may merely have taken page load time from 9.9 to 10.1 seconds. Even though this check-in triggered the failure, a look back at previous check-ins may turn up a change that took the page load time from 4.0 to 9.9 seconds. Clearly, this is the change that needs scrutiny. Another alternative is to look at percentage increments rather than hard values, but this has its own set of problems: a system could continuously degrade in performance by a level just below the percentage threshold with every check-in and never fail the CI tests.
So performance testing departs from the simple “You broke the build, you fix it” model driving many CI processes.
If you’re not currently doing any performance testing, the first step is to start. Choose a free toolset or a toolset that your organization already has access to and start executing some tests. In the short term, this process will probably ask more questions than it answers, but it will all be steps in the right direction.
Next evolve your performance-testing trials into a standard approach that can be used on most of your projects. This standard should include a definition of the types of tests you will run and when, a standard toolset, a policy about which environments to use for which types of tests and how they are created, and finally, an understanding of how results will be analyzed and presented to developers and managers. If your development is usually an evolution of a base product, look at defining a standard set of user journeys and load models that you will use for testing.
This standard will not be set in stone and should constantly change based on specific project needs. But it should be a good starting point for performance testing on all projects.
In addition to defining the performance acceptance criteria described in “Performance Acceptance Criteria”, the project’s specification stage must also consider how and when to do performance testing. This will enable you to drive testing as early as possible within the development process. At all points, ask the following questions while thinking of ways you can do elements of the testing earlier without investing more time and effort than would be gained by the early detection of performance issues:
Look at the performance acceptance criteria and performance targets and determine how you will be able to test them. What levels of usage will you be testing, and what user journeys will you need to execute to validate performance? How soon can scripting those user journeys start? What data will you need to get back to determine success? Will your standard toolset be sufficient for this project?
If you are running a CI process, you should try to integrate an element of performance testing within it. As described earlier, there are a lot of issues involved in doing this, and it takes some thought and effort to get working effectively.
Start with small steps and build on the process. Do not fail builds until there is a degree of trust that the output from the tests is accurate and reliable. Always remember that the human element will be needed to assess results in the gray area between pass and fail.