Table of contents

The GDS Way and its content is intended for internal use by the GDS community.

Monitoring

This document is current until 1 March 2018

Availability monitoring

Pingdom is the tool we’ve used most for availability monitoring. If you need to monitor a service from outside the US or EU, GOV.UK has used CA App Synthetic Monitor.

Smoke tests

Teams should run regular smoke tests to ensure that services are available. Several teams at GDS have used Selenium for this.

User journeys

Teams should use a tool to ensure user journeys are working as expected.

Metrics-based monitoring

Metrics-based monitoring is useful - it works with virtual machines, PaaS and containers. Collecting metrics is useful for capacity planning and autoscaling.

Metrics dashboards

Use Grafana for creating dashboards to view infrastructure and application metrics.

Application error monitoring

Multiple teams at GDS are using or evaluating Sentry for application error monitoring.

Configuration management

Teams should use configuration management to set up monitoring reproducibly.