Make sure your service performs the way users need it to. If your service does not perform as expected when there are extra loads it will impact your users.
You should run performance tests to check how your service performs in certain situations. For example, during a denial-of-service attack (DoS attack).
The Service Manual has more information about why you need to test your service’s performance and how to start capacity planning and running performance tests.
Plan your tests
Plan and scope your test whether you’re testing the entire service or specific user journeys.
Estimate any additional infrastructure costs your tests will incur and share them with your service owner or product manager for approval.
Plan user journeys
Test complete journeys that are representative of how your service is used in production. For example, if a user has to sign in or register, plan tests where some users sign in and others register.
Plan test scenarios
A scenario is a hypothetical event when your service will be under heavy load. For example, every year the G-Cloud framework on the Digital Marketplace opens for applications from suppliers for several weeks. This activity increases load on the Digital Marketplace up to the deadline.
When most of its application infrastructure moved to the GOV.UK PaaS the Digital Marketplace service needed to retest the service. Its developers and performance analysts queried historical logs and Google Analytics for popular journeys, modelling normal and peak capacity needs for the previous year’s G-Cloud framework application period.
This helped the team calculate figures for the largest number of concurrent user sessions and Requests Per Second (RPS) Digital Marketplace was able to support. This made sure the new platform could still cope with its expected network traffic.
Inform people affected by your test. For example, tell hosting providers or developers planning to use the test environment of your test’s date, time and duration. Make sure everyone involved in the test agrees to its conditions.
Build your test
You do not need to fully hand code your tests because most load testing tools like Gatling, include scenario recorders. Some test scenarios may still need manual preparation, for example, providing lists of test users or specific search terms.
Configure your test environment
Test environments must resemble your production environment as far as possible so tests are accurate. Isolate test environments to stop them affecting your production environment.
Consider how to monitor service performance during the load test. Your service will likely already have dashboards and graphs set up for common metrics such as CPU and memory utilization. Decide if your current metrics are sufficient or if you need to turn on additional logging or configure profiling traces.
Perform practice runs of your tests to make sure they work as expected. You may need to disable Cross-Site Request Forgery (CSRF) tokens and rate limiting in your test environment.
Run your tests
Start with a low level of concurrent users and increase them to your anticipated maximum. If there are no problems, increase the amount of traffic until you reach a breaking point in your system.
Document and share your results
When testing your service, record:
- the load when the service broke
- how the service broke and what went wrong
- the changes introduced to improve performance
If your service meets your capacity requirements, document its upper limits and communicate them to the service owner.
Analyse your results
Testing tools should record the number of concurrent users, RPS, HTTP response codes and timings. These reports help determine your service’s maximum supported capacity. If your service cannot meet your requirements or you find problems, you’ll need to diagnose them.
You can fix common problems like inefficient database queries by introducing lazy loading and adding or altering indexes. If your web server stops or is slow to respond, it may be running out of available workers. This is a common issue when you’re sending lots of requests to APIs and the workers become blocked awaiting a response. Adjusting web server configuration can help depending on your specific application and technology.
Problems with your infrastructure can be harder to resolve. For example, if your service is read-heavy you can introduce additional caching to improve performance.
You can scale infrastructure vertically by adding more power, for example upgrading CPUs, memory or storage. You can also scale horizontally by adding additional instances of your service’s components. You can only do this if your service supports this type of scaling.
The Twelve-Factor App provides more information on how to scale your infrastructure and build SaaS.
Make sure you rerun your performance tests after any changes to confirm everything works as expected and you have not introduced any new problems to your service.
Performance testing for auto-scaling environments
Test auto-scaling infrastructures to make sure they scale up and down as expected and are cost-efficient. For example, if infrastructure scales at the correct thresholds, check if servers become stressed and response times slow down before scaling takes effect.
Performance testing helps determine if you’re using the correct machine instance types for your workload. For example, more expensive larger specification machines may be capable of handling more traffic cheaper than adding additional smaller instances to your auto-scaling group. Depending on your service requirements the reverse may be true.
Performance testing for serverless environments
Testing can help you tune serverless components’ performance. For example, Amazon prices AWS Lambda on the number of function executions, allocated memory and estimated execution time.
Performance testing will help determine if a function is taking longer to execute because it does not have enough memory allocated, or is over-provisioned with memory and costing more than necessary.
To find our more about performance testing read: