O'Reilly logo

Beautiful Testing by Adam Goucher, Tim Riley

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

100%?!? Fail

I’d just been informed that I was to start working on a new project to build a computer-based learning delivery and student progress tracking system (I’ll call it eVersity) for a Fortune 50 company on the following Monday. The project was officially entering the development phase, which meant that the client had accepted our proof of concept and it was time to bring the rest of the team onto the project. I was at my desk finishing some documentation for my previous project when Harold, the test manager for the new project, walked up and, without preamble, handed me a single sheet of paper while asking, “Can you test this?”

Though I found the question insulting, I looked at the paper. I got as far as:

“System Performance Requirements:

  • 100% of the web pages shall display in 5 seconds or less 100% of the time.

  • The application shall…”

before writing “FAIL” on a sticky note, slapping the note on the paper, and handing it back to Harold over my shoulder and going back to work. Harold, making no attempt to conceal his anger at my note, asked, “What’s that supposed to mean?” Spinning my chair around to face him, I replied, “I can test it if you want, but c’mon, it’s the Internet! You never get 100% of anything!” Harold walked off in a huff.

Early the next week, Harold returned with another sheet of paper. Handing it to me, he simply asked “Better?” This time I managed to read all of the bullets.

“System Performance Requirements:

  • 95% of the web pages shall display in 5 seconds or less 95% of the time.

  • The application shall support 1,000 concurrent users.

  • Courses shall download completely and correctly on the first try 98% of the time.

  • Courses shall download in 60 seconds or less 95% of the time.”

“Better? Yes. But not particularly useful, and entirely untestable. What is this for, anyway?” I responded. Clearly frustrated, but calm, Harold told me that he’d been asked to establish the performance requirements that were going to appear in our contract to the client. Now understanding the intent, I suggested that Harold schedule a conference room for a few hours for us to discuss his task further. He agreed.

As it turned out, it took more than one meeting for Harold to explain to me the client’s expectations, the story behind his task, and for me to explain to Harold why we didn’t want to be contractually obligated to performance metrics that were inherently ambiguous, what those ambiguities were, and what we could realistically measure that would be valuable. Finally, Harold and I took what were now several sheets of paper with the following bullets to Sandra, our project manager, to review:

“System Performance Testing Requirements:

  • Performance testing will be conducted under a variety of loads and usage models, to be determined when system features and workflows are established.

  • For internal builds, all performance measurements greater than the following will be reported to the lead developer:

    • Web pages that load in over 5 seconds, at any user volume, more than 5% of the time.

    • Web pages that load in over 8 seconds, at any user volume, more than 1% of the time.

    • Courses that do not download completely or correctly more than 2% of the time.

    • Courses that take over 60 seconds to download, at any user volume, more than 5% of the time.

    • The current maximum load the system can maintain for 1 hr with 95% of all web pages loading in 5 seconds or less and 95% of all the courses downloading completely and correctly in 60 seconds or less.

  • External builds will be accompanied by a performance testing report including:

    • Web pages that load in over 5 seconds, at any user volume, more than 5% of the time.

    • Web pages that load in over 8 seconds, at any user volume, more than 1% of the time.

    • Courses that do not download completely or correctly more than 2% of the time.

    • Courses that take over 60 seconds to download, at any user volume, more than 5% of the time.

    • The current maximum load the system can maintain for 1 hr with 95% of all web pages loading in 5 seconds or less and 95% of the courses downloading completely and correctly in 60 seconds or less.

  • At the discretion of the project manager, other performance tests will be conducted that are deemed valuable to the project based on requests or recommendations by [client name deleted], the development team, or the performance test lead.”

Much to our chagrin, Sandra replied that Harold and I should work together more often, and added our bullets verbatim into the client contract.

I fully admit that there was nothing beautiful about the process that led to Harold and I collaborating to turn the original System Performance Requirements into the ultimate System Performance Testing Requirements, but the result was. To be honest, when I found out that Harold had written the original requirements doc that I had “failed” in dramatic fashion, I fully expected to be removed from the project. But regardless of whether Harold tried to have me removed from the project, even he would have acknowledged that there was a certain beauty in the outcome that neither of us would have come up with on our own. Specifically:

  • The shift from committing to achieving certain levels of performance to committing to report under what conditions the performance goals were not being achieved

  • Calling out that it may be some time before enough information would be available to fully define the details of individual performance tests

  • Leaving the door open for performance testing that supported the development process, but that didn’t directly assess compliance with performance goals

Unfortunately, this was not to be the last un-beautiful interaction between Harold and me.

OK, but What’s a Performance Test Case?

A few weeks later, Harold called to tell me he needed me to get all of the “performance test cases” into the eVersity test management system by the end of the following week. I said, “OK, but what’s a performance test case?” As you might imagine, that wasn’t the response he was expecting. The rest of the conversation was short but heated, and concluded with me agreeing to “do the best I could” by the end of that week so that he would have time to review my work.

As soon as I hung up the phone, I fired up the test management system to see if there were any other test cases for what we called nonfunctional requirements (aka quality factors, or parafunctional requirements), such as security or usability. Finding none, I started looking to the functional test cases for inspiration. What I found was exactly what I had feared: a one-to-one mapping between requirements and test cases, and almost all of the requirements were of the form “The system shall X,” and almost all of the test cases were of the form “Verify that the system [does] X.”

I stared at the screen long enough for my session to time out twice, trying to decide whether to call Harold back in protest or try to shoehorn something into that ridiculous model (for the record, I find that model every bit as ridiculous today as I did then). Ultimately, I decided to do what I was asked, for the simple reason that I didn’t think I’d win the protest. The client had mandated this test management system, had paid a lot of money for the licenses, and had sent their staff to training on the system so they could oversee the project remotely. I simply couldn’t imagine getting approval to move performance test tracking outside the system, so I created a new requirement type called “Performance” and entered the following items:

  • Each web page shall load in 5 seconds or less, at least 95% of the time.

  • Each course shall download correctly and completely in 60 seconds or less, at least 98% of the time.

  • The system shall support 1,000 hourly users according to a usage model TBD while achieving speed requirements.

I then created the three parallel test cases for those items and crossed my fingers.

To say that Harold was not impressed when he reviewed my work at the end of the week would be a gross understatement. He must have come straight downstairs to the performance and security test lab where I spent most of my time the instant he saw my entry. As he stormed through the door, he demanded, “How can I justify billing four months of your time for three tests?”

Although I had been expecting him to protest, that was not the protest I’d anticipated. Looking at him quizzically, I responded by saying, “You can’t. Where did you get the idea that I’d only be conducting three tests?” I’ll let you imagine the yelling that went on for the next 15 minutes until I gave up protesting the inadequacy of the test management system, especially for performance testing, and asked Harold what it was that he had in mind. He answered that he wanted to see all of the tests I was going to conduct entered into the system.

I was literally laughing at loud as I opened the project repository from my previous, much smaller, project and invited him to come over and help me add up how many performance tests I’d conducted. The number turned out to be either 967 or 4,719, depending on whether you counted different user data as a different test. Considering that the five-person functional test team had created slightly fewer than 600 test cases for this project, as opposed to approximately 150 on the project I was referencing, even Harold acknowledged that his idea was flawed.

We stared at one another for what felt like a very long time before Harold dialed the phone.

“Sandra, do you have some time to join Scott and me in the lab? Thanks. Can you bring the client contracts and deliverable definitions? Great. Maybe Leah is available to join us as well? See you in a few.”

For many hours, through a few arguments, around a little cursing, and over several pizzas, Harold, Sandra, Leah (a stellar test manager in her own right who was filling the testing technical lead role on this project), Chris (a developer specializing in security with whom I shared the lab and who had made the mistake of wandering in while we were meeting), and I became increasingly frustrated with the task at hand. At the onset, even I didn’t realize how challenging it was going to be to figure out how and what to capture about performance testing in our tracking system.

We quickly agreed that what we wanted to include in the tracking system were performance tests representing valuable checkpoints, noteworthy performance achievements, or potential decision points. As soon as we decided that, I went to the whiteboard and started listing the tests we might include, thinking we could tune up this list and be done. I couldn’t have been more wrong.

I hadn’t even finished my list when the complications began. It turns out that what I was listing didn’t comply with either the terms of the contract or with the deliverables definitions that the client had finally approved after much debate and many revisions. I don’t remember all of the details and no longer have access to those documents, but I do remember how we finally balanced the commitments that had been made to the client, the capabilities of the mandated tracking system, and high-value performance testing.

We started with the first item on my list. Sandra evaluated the item against the contract. Harold evaluated it against the deliverables definitions. Leah assessed it in terms of its usefulness in making quality-related decisions. Chris assessed its informational value for the development team. Only after coming up with a list that was acceptable from each perspective did we worry about how to make it fit into the tracking system.

As it turned out, the performance requirements remained unchanged in the system. The performance test cases, however, were renamed “Performance Testing Checkpoints” and included the following (abbreviated here):

  • Collect baseline system performance metrics and verify that each functional task included in the system usage model achieves performance requirements under a user load of 1 for each performance testing build in which the functional task has been implemented.

    • [Functional tasks listed, one per line]

  • Collect system performance metrics and verify that each functional task included in the system usage model achieves performance requirements under a user load of 10 for each performance testing build in which the functional task has been implemented.

    • [Functional tasks listed, one per line]

  • Collect system performance metrics and verify that the system usage model achieves performance requirements under the following loads to the degree that the usage model has been implemented in each performance testing build.

    • [Increasing loads from 100 users to 3,000 users, listed one per line]

  • Collect system performance metrics and verify that the system usage model achieves performance requirements for the duration of a 9-hour, 1,000-user stress test on performance testing builds that the lead developer, performance tester, and project manager deem appropriate.

The beauty here was that what we created was clear, easy to build a strategy around, and mapped directly to information that the client eventually requested in the final report. An added bonus was that from that point forward in the project, whenever someone challenged our approach to performance testing, one or more of the folks who were involved in the creation of the checkpoints always came to my defense—frequently before I even found out about the challenge!

An interesting addendum to this story is that later that week, it became a company policy that I was to be consulted on any contracts or deliverable definitions that included performance testing before they were sent to the client for approval. I’m also fairly certain that this was the catalyst to Performance Testing becoming a practice area, separate from Functional Testing, and also what precipitated performance test leads reporting directly to the project manager instead of to the test manager on subsequent projects.

You Can’t Performance Test Everything

One of the joys of being the performance testing technical lead for a company that has several development projects going on at once is that I was almost always involved in more than one project at a time. I mention this because the following story comes from a different project, but did occur chronologically between the previous story and the next one.

This project was to build a web-based financial planning application. Although common today, at the time this was quite innovative. The performance testing of the system was high priority for two reasons:

  • We’d been hired for this project only after the client had fired the previous software development company because of the horrible performance of the system it had built.

  • The client had already purchased a Super Bowl commercial time slot and started shooting the commercial to advertise the application.

Understandably, Ted, the client, had instructed me that he wanted “every possible navigation path and every possible combination of input data” included in our performance tests. I’d tried several methods to communicate that this was simply not an achievable task before that year’s Super Bowl, but to no avail. Ted was becoming increasingly angry at what he saw as me refusing to do what he was paying for. After six weeks of trying to solve (or at least simplify) and document a massively complex combinatorics problem, I was becoming increasingly frustrated that I’d been unable to help the developers track down the performance issues that led Ted to hire us in the first place.

One afternoon, after Ted had rejected yet another proposed system usage model, I asked him to join me in the performance test lab to build a model together. I was surprised when he said he’d be right down.

I started the conversation by trying to explain to Ted that including links to websites maintained by other companies as part of our performance tests without their permission was not only of minimal value, but was tantamount to conducting denial-of-service attacks on those websites. Ted wasn’t having any of it. At that moment, I realized I was standing, we were both yelling at one another, and my fists were clenched in frustration.

In an attempt to calm down, I walked to the whiteboard and started drawing a sort of sideways flowchart representing the most likely user activities on the website. To my surprise, Ted also picked up a marker and began enhancing the diagram. Before long, we were having a calm and professional discussion about what users were likely to do on the site during their first visit. Somewhere along the way, Chris had joined the conversation and was explaining to us how many of the activities we had modeled were redundant and thus interchangeable based on the underlying architecture of the system.

In less than an hour, we had created a system usage model that we all agreed represented the items most likely to be popular during the Super Bowl marketing campaign as well as the areas of the application that the developers had identified as having the highest risk of performing poorly. We’d also decided that until we were confident in the performance of those aspects of the system, testing and tuning other parts of the application was not a good use of our time.

Within a week of that meeting, we had an early version of the test we’d modeled up and running, and the developers and I were actively identifying and improving performance issues with the system.

Once again, the story had started anything but beautifully. This time the beauty began to blossom when Ted and I started working together to build a model at the whiteboard rather than me emailing models for him to approve. The beauty came into full bloom when Chris brought a developer’s perspective to the conversation. Collaborating in real time enabled us to not only better understand one another’s concerns, but also to discuss the ROI of various aspects of the usage model comparatively as opposed to individually, which is what we’d been doing for weeks.

This story also has an interesting addendum. As it happened, the whiteboard sketch that Ted, Chris, and I created that day was the inspiration behind the User Community Modeling Language (UCML™) that has subsequently been adopted as the method of choice for modeling and documenting system usage for a large number of performance testers worldwide. For more about UCML, visit http://www.perftestplus.com/articles/ucml.pdf.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required