(For more resources related to this topic, see here.)

The incident

Up until recently, traffic on TrainBot had been light as it had only been opened to a handful of clients, since it was still in closed beta. Everything was fully operational and the application as a whole was very responsive. Just a few weeks ago, TrainBot was open to the public and all was still good and dandy. To celebrate the launch and promote its online training courses, Baysoft Training Inc. recently offered 75 percent off for all the training courses. However, that promotional offer caused a sudden influx on TrainBot, far beyond what the company had anticipated. Web traffic shot up by 300 percent and suddenly things took a turn for the worse. Network resources weren't holding up well, server CPUs and memory were at 90-95 percent and database servers weren't far behind due to high I/O and contention. As a result, most web requests began to get slower response times, making TrainBot totally unresponsive for most of its first-time clients. It didn't take too long after that for the servers to crash and for the support lines to get flooded.

The aftermath

It was a long night at BaySoft Training Inc. corporate office. How did this happen? Could this have been avoided? Why was the application and system not able to handle the load? Why weren't adequate performance and stress tests conducted on the system and application? Was it an application problem, a system resource issue or a combination of both? All of these were questions management demanded answers to from the group of engineers, which comprised software developers, network and system engineers, quality assurance (QA) testers, and database administrators gathered in the WAR room. There sure was a lot of finger pointing and blame to go around the room. After a little brainstorming, it wasn't too long for the group to decide what needed to be done. The application and its system resources will need to undergo extensive and rigorous testing. This will include all facets of the application and all supporting system resources, including, but not limited to, infrastructure, network, database, servers, and load balancers. Such a test will help all the involved parties to discover exactly where the bottlenecks are and address them accordingly.

Performance testing

Performance testing is a type of testing intended to determine the responsiveness, reliability, throughput, interoperability, and scalability of a system and/or application under a given workload. It could also be defined as a process of determining the speed or effectiveness of a computer, network, software application, or device. Testing can be conducted on software applications, system resources, targeted application components, databases, and a whole lot more. It normally involves an automated test suite as this allows for easy, repeatable simulations of a variety of normal, peak, and exceptional load conditions. Such forms of testing help verify whether a system or application meets the specifications claimed by its vendor. The process can compare applications in terms of parameters such as speed, data transfer rate, throughput, bandwidth, efficiency, or reliability. Performance testing can also aid as a diagnostic tool in determining bottlenecks and single points of failure. It is often conducted in a controlled environment and in conjunction with stress testing; a process of determining the ability of a system or application to maintain a certain level of effectiveness under unfavorable conditions.

Why bother? Using Baysoft's case study mentioned earlier, it should be obvious why companies bother and go through great lengths to conduct performance testing. Disaster could have been minimized, if not totally eradicated, if effective performance testing had been conducted on TrainBot prior to opening it up to the masses. As we go ahead in this article, we will continue to explore the many benefits of effective performance testing.

At a very high level, performance testing is always almost conducted to address one or more risks related to expense, opportunity costs, continuity, and/or corporate reputation. Conducting such tests help give insights to software application release readiness, adequacy of network and system resources, infrastructure stability, and application scalability, just to name a few. Gathering estimated performance characteristics of application and system resources prior to the launch helps to address issues early and provides valuable feedback to stakeholders, helping them make key and strategic decisions.

Performance testing covers a whole lot of ground including areas such as:

Assessing application and system production readiness
Evaluating against performance criteria
Comparing performance characteristics of multiple systems or system configurations
Identifying source of performance bottlenecks
Aiding with performance and system tuning
Helping to identify system throughput levels
Testing tool

Most of these areas are intertwined with each other, each aspect contributing to attaining the overall objectives of stakeholders. However, before jumping right in, let's take a moment to understand the core activities in conducting performance tests:

Identify the test environment: Becoming familiar with the physical test and production environments is crucial to a successful test run. Knowing things, such as the hardware, software, and network configurations of the environment help derive an effective test plan and identify testing challenges from the outset. In most cases, these will be revisited and/or revised during the testing cycle.
Identify acceptance criteria: What is the acceptable performance of the various modules of the application under load? Specifically, identify the response time, throughput, and resource utilization goals and constraints. How long should the end user wait before rendering a particular page? How long should the user wait to perform an operation? Response time is usually a user concern, throughput a business concern, and resource utilization a system concern. As such, response time, throughput, and resource utilization are key aspects of performance testing. Acceptance criteria is usually driven by stakeholders and it is important to continuously involve them as testing progresses as the criteria may need to be revised.
Plan and design tests: Know the usage pattern of the application (if any), and come up with realistic usage scenarios including variability among the various scenarios. For example, if the application in question has a user registration module, how many users typically register for an account in a day? Do those registrations happen all at once, or are they spaced out? How many people frequent the landing page of the application within an hour? Questions such as these help to put things in perspective and design variations in the test plan. Having said that, there may be times where the application under test is new and so no usage pattern has been formed yet. At such times, stakeholders should be consulted to understand their business process and come up with as close to a realistic test plan as possible.
Prepare the test environment: Configure the test environment, tools, and resources necessary to conduct the planned test scenarios. It is important to ensure that the test environment is instrumented for resource monitoring to help analyze results more efficiently. Depending on the company, a separate team might be responsible for setting up the test tools, while another may be responsible for configuring other aspects such as resource monitoring. In other organizations, a single team is responsible for setting up all aspects.
Record the test plan: Using a testing tool, record the planned test scenarios. There are numerous testing tools available, both free and commercial that do the job quite well, each having their pros and cons.
Such tools include HP Load Runner, NeoLoad, LoadUI, Gatling, WebLOAD, WAPT, Loadster, LoadImpact, Rational Performance Tester, Testing Anywhere, OpenSTA, Loadstorm, and so on. Some of these are commercial while others are not as mature or as portable or extendable as JMeter is. HP Load Runner, for example, is a bit pricey and limits the number of simulated threads to 250 without purchasing additional licenses. It does offer a much nicer graphical interface and monitoring capability though. Gatling is the new kid on the block, is free and looks rather promising. It is still in its infancy and aims to address some of the shortcomings of JMeter, including easier testing DSL (domain specific language) versus JMeter's verbose XML, nicer and more meaningful HTML reports, among others. Having said that, it still has only a tiny user base when compared with JMeter, and not everyone may be comfortable with building test plans in Scala, its language of choice. Programmers may find it more appealing.

In this book, our tool of choice will be Apache JMeter to perform this step. That shouldn't be a surprise considering the title of the book.
Run the tests: Once recorded, execute the test plans under light load and verify the correctness of the test scripts and output results. In cases where test or input data is fed into the scripts to simulate more realistic data , also validate the test data. Another aspect to pay careful attention to during test plan execution is the server logs. This can be achieved through the resource monitoring agents set up to monitor the servers. It is paramount to watch for warnings and errors. A high rate of errors, for example, could be indicative that something is wrong with the test scripts, application under test, system resource, or a combination of these.
Analyze results, report, and retest: Examine the results of each successive run and identify areas of bottleneck that need addressing. These could be system, database, or application related. System-related bottlenecks may lead to infrastructure changes such as increasing the memory available to the application, reducing CPU consumption, increasing or decreasing thread pool sizes, revising database pool sizes, and reconfiguring network settings. Database-related bottlenecks may lead to analyzing database I/O operations, top queries from the application under test, profiling SQL queries, introducing additional indexes, running statistics gathering, changing table page sizes and locks, and a lot more. Finally, application-related changes might lead to activities such as refactoring application components, reducing application memory consumption and database round trips. Once the identified bottlenecks are addressed, the test(s) should then be rerun and compared with previous runs. To help better track what change or group of changes resolved a particular bottleneck, it is vital that changes are applied in an orderly fashion, preferably one at a time. In other words, once a change is applied, the same test plan is executed and the results compared with a previous run to see if the change made had any improved or worsened effect on results. This process repeats until the performance goals of the project have been met.

performance-testing-fundamentals-img-0

Performance testing core activities

Performance testing is usually a collaborative effort between all parties involved. Parties include business stakeholders, enterprise architects, developers, testers, DBAs, system admins, and network admins. Such collaboration is necessary to effectively gather accurate and valuable results when conducting testing. Monitoring network utilization, database I/O and waits, top queries, and invocation counts, for example, helps the team find bottlenecks and areas that need further attention in ongoing tuning efforts.

Performance testing and tuning

There is a strong relationship between performance testing and tuning, in the sense that one often leads to the other. Often, end-to-end testing unveils system or application bottlenecks that are regarded as incompatible with project target goals. Once those bottlenecks are discovered, the next step for most teams is a series of tuning efforts to make the application perform adequately.

Such efforts normally include but are not limited to:

Configuring changes in system resources
Optimizing database queries
Reducing round trips in application calls; sometimes leading to re-designing and re-architecting problematic modules
Scaling out application and database server capacity
Reducing application resource footprint
Optimizing and refactoring code; including eliminating redundancy, and reducing execution time

Tuning efforts may also commence if the application has reached acceptable performance but the team wants to reduce the amount of system resources being used, decrease volume of hardware needed, or further increase system performance.

After each change (or series of changes), the test is re-executed to see whether performance has increased or declined as a result of the changes. The process will be continued until the performance results reach acceptable goals. The outcome of these test-tuning circles normally produces a baseline.

Baselines

Baseline is a process of capturing performance metric data for the sole purpose of evaluating the efficacy of successive changes to the system or application. It is important that all characteristics and configurations except those specifically being varied for comparison remain the same, in order to make effective comparisons as to which change (or series of changes) is the driving result towards the targeted goal. Armed with such baseline results, subsequent changes can be made to system configuration or application and testing results compared to see whether such changes were relevant or not. Some considerations when generating baselines include:

They are application specific
They can be created for system, application, or modules
They are metrics/results
They should not be over generalized
They evolve and may need to be redefined from time to time
They act as a shared frame of reference
They are reusable
They help identify changes in performance

Load and stress testing

Load testing is the process of putting demand on a system and measuring its response; that is, determining how much volume the system can handle. Stress testing is the process of subjecting the system to unusually high loads far beyond its normal usage pattern to determine its responsiveness. These are different from performance testing whose sole purpose is to determine the response and effectiveness of a system; that is, how fast is the system. Since load ultimately affects how a system responds, performance testing is almost always done in conjunction with stress testing.