Table of contents

The GDS Way and its content is intended for internal use by the GDS community.

How to monitor your service

At GDS, we follow the Service Manual guidance on how to monitor the status of services and set performance metrics.

We recommend using Pingdom to monitor your service’s availability. To further make sure your service is working, you should:

  • run regular smoke tests using a browser automation app such as Selenium
  • implement a tool to ensure user journeys are working as you expect
  • monitor applications for errors using an error tracking app such as Sentry
  • implement configuration management to set up repeatable monitoring

Collecting metrics on the performance of your service is useful for capacity planning and autoscaling. You should apply metrics-based monitoring to measure aggregated numerical data about your service and create Grafana dashboards to view metrics from your datasource, for example related to your infrastructure or application.

Reliability Engineering is running a beta on using Prometheus as the operational metrics service for GDS. It will be available to all teams that use the recommended hosting options. Read the reliability engineering docs to find out more.

This page was last reviewed on 3 September 2018. It needs to be reviewed again on 3 March 2019 by the page owner #gds-way .
This page was set to be reviewed before 3 March 2019 by the page owner #gds-way. This might mean the content is out of date.