Skip to content

Introduction to Metrics, Logs, Traces and Profiles session companion code.

License

Notifications You must be signed in to change notification settings

robinmarechal/intro-to-mltp

 
 

Repository files navigation

Introduction to Metrics, Logs, Traces and Profiles in Grafana

This was originally the companion repository to a series of presentations over the three pillars of observability within Grafana. Whilst that series is now over a year old, we have kept this repository up-to-date with the latest versions of our products and added more functionality as our products have grown.

It is presented as a self-enclosed Docker sandbox that includes all of the components required to run on a local machine and experiment with the products provided.

Since the original series, this repository has seen its use grow. Whilst we still highly recommend everyone to sign up for a Grafana Cloud account, this repository exists as an easy way to get started with Grafana's offerings in a non-cloud, local-based setup. In addition, you can use this setup to experiment with configuration settings on those offerings.

You can also send data from the example microservice application to Grafana Cloud products.

Prerequisites

The following demonstration environment requires:

Overview

The demos from this series were based on the application and code in this repository, which includes:

  • Docker Compose manifest for easy setup.
  • Four-service application:
    • A service requesting data from a REST API server.
    • A REST API server that receives requests and utilises a Database for storing/retrieving data for those requests.
    • A recorder service for storing messages to an AMQP bus.
    • A Postgres Database for storing/retrieving data from.
  • k6 service running a load test against the above application.
  • Tempo service for storing and querying trace information.
  • Loki service for storing and querying log information.
  • Mimir service for storing and querying metric information.
  • Pyroscope service for storing and querying profiling information.
  • Beyla services for watching the four-service application and automatically generating signals.
  • Grafana service for visualising observability data.
  • Grafana Agent service for receiving traces and producing metrics and logs based on these traces.
  • A Node Exporter service to retrieve resource metrics from the local host.

Running the Demonstration Environment

Docker Compose will download the required Docker images, before starting the demonstration environment.

In the following examples, the in-built compose command is used with a latest version of Docker (for example, docker compose up). If using an older version of Docker with a separate Docker Compose binary, ensure that docker compose is replaced with docker-compose.

Data will be emitted from the microservice application and be stored in Loki, Tempo, and Prometheus. You can login to the Grafana service to visualize this data.

To execute the environment and login:

  1. Start a new command-line interface in your Operating System and run:

    docker compose up
  2. Login to the local Grafana service at http://localhost:3000/.

    NOTE: This assumes that port 3000 is not already in use. If this port is not free, edit the docker-compose.yml file and alter the line

    - "3000:3000"
    

    to some other host port that is free, for example:

    - "3123:3000"
    
  3. Navigate to the MLT dashboard.

  4. Explore the data sources using the Grafana Explorer.

The pre-provisioned dashboard demonstrates a RED (Rate, Error, Duration) overview of the microservice application, where almost all metrics are being generated via trace spans. The dashboard also provides an example of logging.

Data links, exemplars, and logs are utilized to allow jumping from the dashboard to a Grafana Explore page to observe traces, metrics, and logs in more detail.

The following sections are a brief explanation of each of the most important provided components.

Using the OpenTelemetry Collector

You can swap out the Grafana Agent for the OpenTelemetry collector using an alternative configuration.

Read the 'OpenTelemetry Collector' section below to use this environment instead.

Grafana

Grafana is a multi-platform open source analytics and interactive visualisation web application. For more details about Grafana, read the documentation.

The Grafana service is described in the grafana section of the docker-compose.yml manifest.

The Docker Compose manifest:

  • Mounts two repository directories to provide pre-provisioned data sources for data (grafana/definitions, grafana/provisioning).
  • A pre-provisioned dashboard for correlating metrics, logs and traces.
    • The dashboard uses metrics from span traces to provide RED (Rate/Error/Duration) signals.
    • Data links are built into Grafana panels to pre-populate TraceQL queries based on the data. These act as an initial guide for digging into more detailed trace queries by extending the TraceQL.
  • Exposes port 3000 for local login.
  • Enables two Tempo features, namely span search and service graph support.

The updated topnav navigation within Grafana is enabled. If you wish to default back to the old UI, remove the topnav feature flag in the GF_FEATURE_TOGGLES_ENABLE environment variable for the grafana service in the docker-compose.yml manifest.

Mimir

Mimir is a backend store for metrics data from various sources. For more details about Mimir, read the documentation.

The Mimir service is described in the mimir section of the docker-compose.yml manifest.

The configuration file (mimir/mimir.yml):

  • Configures a single service container acting as all relevant microservices.
  • Stores the metrics data in-container (this will be lost on container deletion).

In addition to the scraped metrics, the Mimir service also receives remotely written metrics from the Tempo service, which derives metrics from incoming trace spans.

This example of the Mimir data source shows a histogram with exemplars (links to relevant traces). The example is available once the system is running and has collected enough data.

Loki

Loki is a backend store for long-term log retention. For more details about Loki, read the documentation.

The Loki service is described in the loki section of the docker-compose.yml manifest.

Loki's default configuration is used, and as such no custom configuration file is bound to the container (unlike Mimir and Tempo).

This example shows the Loki data source using LogQL.

The microservices application sends its logs directly to the Loki service in this environment, via its REST API. There is the potential to switch this to the Loki Docker driver, if desired. To do so, follow the instructions for the driver, and then remove the three occurences of:

- LOGS_TARGET=http://loki:3100/loki/api/v1/push

in the docker-compose.yml manifest for the mythical-receiver, mythical-server and mythical-recorder services. This will instead force the microservices to output logs to stdout which will be picked up by the Loki Docker driver.

Tempo

Tempo is a backend store for longterm trace retention. For more details about Tempo, read the documentation.

The Tempo service is described in the tempo section of the docker-compose.yml manifest.

The Tempo service imports a configuration file (tempo/tempo.yaml) that initialises the service with some sensible defaults as well as allowing the receiving of traces in a variety of different formats.

Tempo is also configured to generate metrics from incoming trace spans as part of it's configuration. As such, this no longer occurs via Grafana Agent (although the original configuration for the Agent to carry this out has been left in the Agent configuration file as a guide).

For an example of a simple search, look at the Explorer page using the Tempo data source, here. Note: Native searches no longer exist, and these are interpretted as TraceQL before execution. See the bottom of the search panel to show the equivalent TraceQL

This example uses the same parameters as above, but in TraceQL (a fully featured tracing query language).

For an example of the mini-APM table and Service Graphs, use the 'Service Graph' tab here.

Traces are instrumented using the OpenTelemetry SDK, more details on which can be found here.

Pyroscope

Pyroscope is a continuous profiling backend store.

The Pyroscope service is described in the pyroscope section of the docker-compose.yml manifest.

Pyroscope uses a configuration file (pyroscope/pyroscope.yaml) that is configured to scrape pprof based profiles from the Mythical microservices. It uses the Pyroscope NodeJS bindings in source instrumentation.

Samples are scraped directly from the application on the /debug/pprof/profile and /debug/pprof/heap endpoints.

You can see an example of profiling in action once the system is running by using the Explorer to visualise the profiles stored here.

k6

k6 is a load testing suite that allows you to synthetically load and monitor your application. For more details about k6, read the documentation.

The k6 service is described in the k6 section of the docker-compose.yml manifest.

The k6 service uses the script (k6/mythical-loadtest.js) to define the tests that it should run. These are currently a GET, POST and DELETE set of tests on the application's API endpoints.

k6 can run one of more VU (Virtual Users) concurrently, to simulate parallel load on the application. Currently, the number of VUs is set to 1, although this may be changed by altering the value for the K6_VUS environment variable in the relevant Docker Compose YAML file. Note: The higher the number of VUs executing, the higher the load on the machine running the Docker Compose sandbox, as this will transfer a significant amount of data. You may find tests being throttled if you ramp this number up without enough resource/bandwidth.

k6 will generate metrics about the tests that it carries out, and will send these to the running Mimir instance. These metrics can then be used to determine the latencies of endpoints, number of errors occuring, etc. The official Grafana dashboard for k6 is included, and once the sandbox is running, may be found here.

Beyla

Beyla is an eBPF-based tool for generating metrics and trace data without the need for application instrumentation. For more details about Tempo, read the documentation.

The Beyla services are described in the beyla-requester, beyla-server and beyla-recorder sections of the docker-compose.yml manifest.

The configuration for Beyla can be found in the beyla/config.yaml file, and describes the main application endpoints for the Mythical Server service.

Beyla operates by using hooks into the kernel networking layer to examine calls made to it from the specified process. It then generates a set of default metrics and trace types based on the network calls made.

It also uses a subset of the OpenTelemetry environment variables/configuration options to determine where to send those metrics and traces (as well as its own envars and config options to expose extra functionality). See the configuration options for Beyla here.

For this Docker Compose setup, a Beyla service is required for each of the other containers that should be inspected, as the pid namespace needs to be shared between the application service and the relevant Beyla service. This also allows the unique service naming for each individual application service via Beyla.

Once the Docker Compose project is running, you can see examples of traces that are emitted by Beyla can be seen here, and an example of the metrics that are emmitted by Beyla can be seen here.

Grafana Agent

Note: We have now moved to a default of a Flow/River configuration, due to parity with static mode (as well as more advanced functionality).

Grafana Agent is a configurable local agent for receiving metrics, logs and traces and forwarding them to relevant database stores. For more details about Grafana Agent, read the documentation.

Grafana Agent is a locally installed agent that acts as:

  • A Prometheus scraping service and metric/label rewriter.
  • A Promtail (Loki logs receiver) service and processor.
  • A Tempo trace receiver and span processor.
  • Remote writer for MLT data to Grafana Cloud (or any other compatible storage system).

In this example environment, Grafana Agent:

  • Receives metrics data, via scrape configs, emitted by:
    • The microservice application.
    • The Mimir service for operational monitoring.
    • The Loki service for operational monitoring.
    • The Tempo service for operatational monitoring.
    • The Agent itself, for operational monitoring.
    • The installed Node Exporter service.
  • Receives trace data, via trace configs, emitted by the microservice application.
  • Generates automatic logging lines based on the trace data received.
  • Sends metric, log and trace data onwards to the Mimir, Loki and Tempo services, respectively.
  • Has optional (unused by default) configurations for metrics generation and trace tail sampling.

Grafana Agent implements a graph-based configuration via it's Flow architecuture, using a programatic language, River, to define Grafana Agent functionality.

Once running, you can observe the Flow configuration running on the Grafana Agent itself by navigating to http://localhost:12347. This webpage will allow you to view all of the current components being used for receiving MLT signals, as well as graphs denoting source and target relationships between components.

The full configuration for Grafana Agent can be found here.

Read the Debugging documentation for Grafana Agent for more details.

The tutorial guide to working with Flow and River is a great first starting point, whilst the full reference guide for Flow shows the currently supported components and configuration blocks.

Note that as Grafana Agent scrapes metrics for every service defined in the docker-compose.yml that a significant number of metric active series are produced (approximately 11,000 at time of writing).

Metrics Generation

It should be noted that since v1.4.0, Tempo has included the ability to generate RED (Rate, Error, Duration) span and service graph metrics.

As such, the Grafana Agent configuration now includes a commented section where those metrics used to be generated; this is now handled directly in Tempo via server-side metrics generation.

Whilst this is convenient for many users, you may prefer to generate metrics locally via Grafana Agent rather than Tempo server-side. These include environments where tail-based sampling may be utilized to discard certain traces.

Tempo metrics generation will only generate span and service graph metrics for trace spans that Tempo receives. If tail sampling is active, then a full view of the metrics in a system will not be available.

In these instances, using Grafana Agent to generate metrics can ensure a complete set of metrics for all traces span data are generated, as the Agent carries out tail sampling post-metrics generation.

Flow and River Configuration

Whilst the default configuration is via Flow's River language, you can switch this to a provided Static configuration defined in YAML.

To use the Static configuration instead, follow the inline commented instructions in the agent service section of the docker-compose.yml file.

Once altered, the Static configuration can be used by restarting Docker Compose if it is currently running:

docker compose restart

or using the startup commands in the 'Running the Demonstration Environment' section.

Microservice Source

The source for the microservice application can be found in the source directory. This three-service application utilizes a PostgreSQL database and an AMQP bus to store data.

The services are written in JavaScript and execute under NodeJS inside Docker containers.

The requester service makes 'random' requests to the server, which then inserts, retrieves or deletes data from the Postgres database. The requester service also stores data to the AMQP queue via the recorder service.

All three services use common code to deal with the queue, logging and tracing requirements they have. The latter is an example of a simple shim API library for utilising the OpenTelemetry SDK in an application.

There is a common Dockerfile that is used to build all three services.

Grafana Cloud

Note: By default, as mentioned in the Grafana Agent section, metrics are scraped by default from every service. If sending metrics to Grafana Cloud, check the number of metric active series that you can store without additional cost.

This demo can be run against Grafana Cloud using Docker Compose as follows:

  1. Configure and source the Metrics, Logs, and Traces environment variables in the file cloud/envvars-grafana-cloud-unconfigured.sh from a Grafana Cloud service. Generally, only the environment variables tagged with “ID” and “API_KEY” need to be updated, sometimes the _HOST variables will need to be modified. You can retrieve these details from your Grafana Cloud organization pages for the relevant stack.

    source cloud/envvars-grafana-cloud-unconfigured.sh
  2. Configure the environment

    ./ctl.sh cloud-configure
  3. Run the demo using Docker Compose

    ./ctl.sh cloud-up
  4. Load the example dashboard cloud/dashboard-metrics-logs-traces-1.json into the Grafana Cloud Instance to visualize the metrics, logs and traces being generated by this demo

  5. Stop the demo using

    ./ctl.sh cloud-down

Note: The configuration for Grafana Agent to send data to Grafana Cloud includes configuration that generates span metrics and service graph metrics.

OpenTelemetry Collector

You can also use an alternative environment that uses the OpenTelemetry Collector in place of Grafana Agent.

Running the Demonstration Environment

Docker Compose downloads the required Docker images, before starting the demonstration environment.

In the following examples, the in-built compose command is used with a latest version of Docker (for example, docker compose up). If using an older version of Docker with a separate Docker Compose binary, ensure that docker compose is replaced with docker-compose.

Data is emitted from the microservice application and stored in Loki, Tempo, and Prometheus. You can login to the Grafana service to visualize this data.

Note: The OpenTelemetry Collector does not currently include an exporter for Pyroscope, and therefore the Docker Compose manifest for the OpenTelemetry Collector does not support the export of profiles.

To execute the environment and login:

  1. Start a new command-line interface in your Operating System and run:

    docker compose -f docker-compose-otel.yml up
  2. Login to the local Grafana service at http://localhost:3000/.

    NOTE: This assumes that port 3000 is not already in use. If this port is not free, edit the docker-compose.yml file and alter the line

    - "3000:3000"
    

    to some other host port that is free, for example:

    - "3123:3000"
    
  3. Explore the data sources using the Grafana Explorer.

The OpenTelemetry Collector is defined as the opentelemetry-collector service in the docker-compose-otel.yml manifest.

A basic configuration that mimics that of the Grafana Agent configuration can be found in the otel/otel.yml configuration file.

In much the same way that the Grafana Agent configuration operates, this scrapes several targets to retrieve Prometheus metrics before batching them and remote writing them to the local Mimir service.

Additionally, the OpenTelemetry Collector receives traces via OTLP gRPC, batches them, and then remote writes them to the local Tempo instance.

Span and service graph metrics generation

Span metrics and service metrics are also available, but have not been attached to the trace receiver defined in the Agent configuration file as generation is handled in Tempo by default. You may switch to Agent-based metrics generation by following the directions in the agent/config.river file in the the otlp_receiver tracing configuration section. There are comments showing you which lines to uncomment to add both metrics generator collectors to add to the graph. You will also need to comment out the metrics generation in tempo/tempo.yaml to generate metrics from the Agent rather than in Tempo (the same holds true for the OpenTelemetry metrics generation configuration sections). You can do the equivalent metrics generation in the OpenTelemetry Collector by following the relevant instructions on uncommenting/commenting processors, exporters and receivers sections in otel/otel.yaml.

There are occassionally good reasons to use local span metrics and service graph generation instead of relying on the Tempo backend to do so. Cases include an oversight of your entire application metrics, which could potentially be obscured should you enable tail sampling (as the Tempo metrics generator will only generate metrics for trace spans that it ingests). Because tail sampling can be configured in the pipeline at a later stage to that of metrics generation, this ensures that all traces spans can be used to generate a complete metrics view regardless if those traces are discarded later in the pipeline.

Note:

  • The naming scheme in Grafana Agent and OpenTelemetry collector is different to that of Tempo. The newer spanmetrics connector details the changes, but in the provided dashboards, any reference to the metrics prefix traces_spanmetrics_latency_ should be altered to traces_spanmetrics_duration_milliseconds_, should you choose to use Grafana Agent/OpenTelemetry generated metrics.
  • Metrics generation adds a significant load to the Grafana Agent/OpenTelemetry Collector. You may find that on machines with smaller resources that removing the k6 service (by commenting it out in the relevant Docker Compose manifests, or removing it entirely) will prevent unexpected resource use and/or container failures due to limited CPU and memory resources.

About

Introduction to Metrics, Logs, Traces and Profiles session companion code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 78.5%
  • Shell 19.3%
  • Dockerfile 2.2%