Skip to content

Add VictoriaMetrics switch guide for TiUP cluster #20957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 152 additions & 8 deletions maintain-tidb-using-tiup.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,7 @@

# TiUP Common Operations

This document describes the following common operations when you operate and maintain a TiDB cluster using TiUP.

- View the cluster list
- Start the cluster
- View the cluster status
- Modify the configuration
- Stop the cluster
- Destroy the cluster
This document describes the common operations when you operate and maintain a TiDB cluster using TiUP.

## View the cluster list

Expand Down Expand Up @@ -290,3 +283,154 @@
```bash
tiup cluster destroy ${cluster-name}
```

## Switch from Prometheus to VictoriaMetrics

In large clusters, Prometheus may face efficiency challenges, especially when there are many instances. Since TiUP v1.16.3, TiUP supports switching the metrics server from Prometheus to VictoriaMetrics (VM) to provide better scalability, higher performance, and lower resource consumption.

Check warning on line 289 in maintain-tidb-using-tiup.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'many' because it may cause confusion.", "location": {"path": "maintain-tidb-using-tiup.md", "range": {"start": {"line": 289, "column": 89}}}, "severity": "INFO"}

### Set up VictoriaMetrics for a new deployment

By default, TiUP uses Prometheus as the metrics server. To use VictoriaMetrics instead of Prometheus in a new deployment, configure the topology file as follows:

```yaml
# Monitoring server configuration
monitoring_servers:
# IP address of the monitoring server
- host: ip_address
...
prom_remote_write_to_vm: true
enable_prom_agent_mode: true

# Grafana server configuration
grafana_servers:
# IP address of the Grafana server
- host: ip_address
...
use_vm_as_datasource: true
```

### Migrate an existing deployment to VictoriaMetrics

You can perform the migration process without affecting running instances. Existing metrics will remain in Prometheus, and TiUP will write new metrics to VictoriaMetrics.

#### Enable VictoriaMetrics remote write

1. Edit the cluster configuration:

```bash
tiup cluster edit-config ${cluster-name}
```

2. Under `monitoring_servers`, set `prom_remote_write_to_vm` to `true`:

```yaml
monitoring_servers:
- host: ip_address
...
prom_remote_write_to_vm: true
```

3. Reload the updated configuration:

```bash
tiup cluster reload ${cluster-name} -R prometheus
```

#### Switch the default data source to VictoriaMetrics

1. Edit the cluster configuration:

```bash
tiup cluster edit-config ${cluster-name}
```

2. Under `grafana_servers`, set `use_vm_as_datasource` to `true`:

```yaml
grafana_servers:
- host: ip_address
...
use_vm_as_datasource: true
```

3. Reload the updated configuration:

```bash
tiup cluster reload ${cluster-name} -R grafana
```

#### View historical metrics generated before the switch (optional)

If you need to view historical metrics generated before the switch, you can switch the Grafana data source as follows:

1. Edit the cluster configuration:

```bash
tiup cluster edit-config ${cluster-name}
```

2. Comment out `use_vm_as_datasource` under `grafana_servers`:

```yaml
grafana_servers:
- host: ip_address
...
# use_vm_as_datasource: true
```

3. Reload the updated configuration:

```bash
tiup cluster reload ${cluster-name} -R grafana
```

4. To switch back to VictoriaMetrics, repeat the steps in "Switch the default data source to VictoriaMetrics".

### Clean up old metrics and services

After confirming that old metrics have expired, you can remove redundant services and files as follows. This will not affect the running cluster.

#### Set Prometheus to agent mode

1. Edit the cluster configuration:

```bash
tiup cluster edit-config ${cluster-name}
```

2. Under `monitoring_servers`, set `enable_prom_agent_mode` to `true`, and ensure you also set `prom_remote_write_to_vm` and `use_vm_as_datasource` correctly:

```yaml
monitoring_servers:
- host: ip_address
...
prom_remote_write_to_vm: true
enable_prom_agent_mode: true
grafana_servers:
- host: ip_address
...
use_vm_as_datasource: true
```

3. Reload the updated configuration:

```bash
tiup cluster reload ${cluster-name} -R prometheus
```

#### Remove expired data directories

1. Find the `data_dir` of the monitoring server in the configuration file:

```yaml
monitoring_servers:
- host: ip_address
...
data_dir: "/tidb-data/prometheus-8249"
```

2. Remove the data directory:

```bash
rm -rf /tidb-data/prometheus-8249
```