Skip to main content

The GDS Way and its content is intended for internal use by the GDS and CO CDIO communities.

How to monitor your service

In the CDIO and GDS, we follow the Service Manual guidance on how to monitor the status of services and set performance metrics.

We recommend using Pingdom to monitor your service’s availability. To further make sure your service is working, you should:

  • run regular smoke tests using a browser automation app such as Selenium
  • implement a tool to ensure user journeys are working as you expect
  • monitor applications for errors using an error tracking app such as Sentry
  • implement configuration management to set up repeatable monitoring

Using metrics-based monitoring

Collecting metrics on the performance of your service is useful for capacity planning and autoscaling. You should apply metrics-based monitoring to measure aggregated numerical data about your service and create Grafana dashboards to view metrics from your datasource, for example related to your infrastructure or application.

Reliability Engineering is running a beta on using Prometheus as the operational metrics service for GDS. It will be available to all teams that use the recommended hosting options. Read the reliability engineering docs to find out more.

This page was last reviewed on 16 November 2021. It needs to be reviewed again on 16 May 2022 by the page owner #gds-way .
This page was set to be reviewed before 16 May 2022 by the page owner #gds-way. This might mean the content is out of date.