Identifying Peak periods and Throughput
When we set on the journey to evaluate if the application under test (AUT) meets the Non-Functional Requirements (NFRs), one of the key items is usually to evaluate if the AUT meets or is able to support the peak traffic or throughput. But how does one get that number?
As a performance engineer it is important to understand and document the source of this information. What is even more important is to understand how the peak period was identified and calculated. Every data analysis tool usually aggregates data and with data aggregation comes information loss. This loss is not a good thing to help to identify any sudden spikes or peak traffic across a short period of time and understand what specific business processes were done unless we take a closer look.
Identify Peak Periods
The first journey is to ensure you have the right tool and enough data to help identify the peak period. If the AUT is a new application/service, then this number will be based on the estimation of customer traffic and will be based on several factors such as business marketing and reach, popularity of the service or public interest, features being rolled out, etc.
For a new service its best to ensure you have enough buffer and as we’re in the new world it is critical to design future proof software that is highly scalable, available and resilient. Never compensate on these non-functional features due to Early-To-Market need. Hire the experts! An initial failure can turn-off a customer and loss of trust is quite difficult to gain back.
For existing services identifying peak periods would start by identifying the peak business months and looking at data. The easiest and lowest form is the web access logs. However if you have APM tools available then that should be your go to.
Calculating Peak Throughput
Once you have identified the peak periods (such as Local holiday periods or financial year ending, etc), you will have to start reducing the period and calculate the peak during that period.
Usually peak throughput for services that deal with more than 800K hits per day is referred with the unit of TPS (transactions per second) or RPS (requests per second). The unit here is per second. At times if the TPS is quite low we refer to it as TPH (transactions per hour).
Lets take an example to illustrate:
Say you work for electronics retailer chain and are rolling a new application layer that will help process high volumes of traffic. The big bosses have planned a big sale during the financial year end to clear the inventory and announce some good year end profit and build on customer confidence and get more investment from the market.
The application layer has been built by a vendor. The testing has yet been kept within the organisation. You’ve been hired to smash the application and say whether it will support the big day and do whats required to ensure it will.
You start off by looking at the traffic for the last 2 years to identify peak periods. Having those conversations with the Business Analysts (BA) and Software Architects is important to ensure you are looking in the right place.
Now you’ve identified the month – say June where you see huge traffic. Now hereon what you do to identify the peak TPS to test will determine if you have done good testing or not. The BA sends you an email that he has identified a peak traffic is about 12 million hits per day or about ~139 TPS. Sweeeeeet!! That was easy.
However this would seem too easy. Not that you believe the BA to be any less qualified to question their work however what you are doing is by asking the right set of questions ensuring the method of calculation is correct.
By seeing 139 TPS =~ 12 millions transactions per day you feel that isn’t right, people don’t shop throughout the day. Ah wait this seems like an equal distribution.
You take over and break those 12 million hits by the hour. You identify traffic above average from 10am to around 7pm out of which 12pm to 2pm is the biggest spike of about 3 million requests during that duration. Further drill down by distributing it over a min and further by the second you identify the peak throughput is about 416 TPS … say what???!!!!
That is a huge deviation. You apply the same approach on other days and other peak periods and identify the throughput.
You also break up that throughput across different end-points to better understand which business processes to simulate.
You test for that throughput, hit issues, talk to your vendor and ask them to tune their application in time for the launch.