Table of contents

The GDS Way and its content is intended for internal use by the GDS community.

How to store and query logs

You will store and query logs to detect any potential errors within infrastructure that could be critical. Use the shared GDS Logit account to store and query all your infrastructure and application logs.

Logit offers cloud-based on-demand, shared ELK (Elasticsearch, Logstash and Kibana) stacks as a service. They are suitable for short-term storage of logs, and are accessible and queryable. The Reliability Engineering documentation has more information about using Logit.

You should not operate your own self-hosted ELK stacks for logging. The Logit cloud-hosted ELK solution is easier to manage.

Your product may have legal or other requirements determining how long you should store logs. For example, the Payment Card Industry Data Security Standard (PCI-DSS) requires 3 months of easily accessible logs and 12 months of archived logs.

Short-term storage

You should not need to store logs spanning long periods in your short-term queryable store. Practical retention periods for short-term queryable logs are:

  • no more than 7 days for non-production environments
  • no more than 30 days production environments

Long-term storage

When storing logs, you should focus on long-term durability for compliance reasons or to identify long-term performance trends.

You should be aware that ELK is not appropriate for long-term storage. There are other services which are less expensive and more reliable, for example if you use Amazon Web Services (AWS) you could use Amazon CloudWatch to run basic queries on older logs which may be a suitable solution.

Log shipping

If you have set up log shipping, you should consider:

  • how your logs will get to short and long-term stores - we recommend using the Elastic Beats family of tools as a platform to collect and ship your data.
  • what happens if one of these stores is unavailable
  • whether the store will be overloaded when it comes back online

You could have one process to ship logs to your long-term archive, and a second process to fill a short-term query tool from the long-term archive, for example you could combine Elasticsearch and Amazon Cloudwatch. This way you can be confident everything is safely stored.

The Beats tools can detect periods of service unavailability. The Beats tools recall how much data is sent to your store and will resume sending data, where they left off, when your service comes back online. By using a protocol which is backpressure-aware Beats tools also won’t overload Logstash with data.

This page was last reviewed on 3 October 2018. It needs to be reviewed again on 3 April 2019 by the page owner #gds-way .
This page was set to be reviewed before 3 April 2019 by the page owner #gds-way. This might mean the content is out of date.