This documentation gives an overview of work completed by the GDS Reliability Engineering Team. This is a living document and is in constant change during the discovery and exploration phases. This documentation will contain useful resources for teams setting up monitoring tools and for us to support them.
The solution is based on:
- Documentation
- Guidance and best practises
- Dashboard templates
- Query examples
- Useful resources
- Exporter notes
- Alert Manager
- Diagrams
- Architecture Decision Records
We will record design decisions for the architecture to ensure we preserve the context of our choices. These will be written in the format proposed in a blog post by Michael Nygard
Please see the decisions directory for a list of all ADRs.
We will use adr-tools to help manage the decisions.
brew install adr-tools
adr new 'Decision to record'
Please ensure that this tool is used at the root of the repository only.