Table of contents

The GDS Way and its content is intended for internal use by the GDS community.

How to monitor your service

This document is current until 30 September 2018

At GDS, we follow the Service Manual guidance on how to monitor the status of services and set performance metrics.

We recommend using Pingdom to monitor your service’s availability. To further ensure your service is working, you should:

  • run regular smoke tests using a browser automation app such as Selenium
  • implement a tool to ensure user journeys are working as you expect
  • monitor applications for errors using an error tracking app such as Sentry
  • implement configuration management to set up repeatable monitoring

Collecting metrics on the performance of your service is useful for capacity planning and autoscaling. You should:

  • apply metrics-based monitoring because it’s not platform-specific
  • create dashboards using Grafana to view metrics from your datasource, for example related to your infrastructure or application

Reliability Engineering is running a beta on using Prometheus as the operational metrics service for GDS. It will be available to all teams that use the recommended hosting options. Please see the reliability engineering docs to find out more.