diff --git a/tidb-cloud/monitor-prometheus-and-grafana-integration.md b/tidb-cloud/monitor-prometheus-and-grafana-integration.md index 31515e1c0d4ce..21531a159f66f 100644 --- a/tidb-cloud/monitor-prometheus-and-grafana-integration.md +++ b/tidb-cloud/monitor-prometheus-and-grafana-integration.md @@ -55,7 +55,13 @@ To get the `scrape_config` file for Prometheus, do the following: After your Prometheus service is reading metrics from TiDB Cloud, you can use Grafana GUI dashboards to visualize the metrics as follows: 1. Download the Grafana dashboard JSON of TiDB Cloud [here](https://github.com/pingcap/docs/blob/master/tidb-cloud/monitor-prometheus-and-grafana-integration-grafana-dashboard-UI.json). -2. [Import this JSON to your own Grafana GUI](https://grafana.com/docs/grafana/v8.5/dashboards/export-import/#import-dashboard) to visualize the metrics. + +2. [Import this JSON to your own Grafana GUI](https://grafana.com/docs/grafana/v8.5/dashboards/export-import/#import-dashboard) to visualize the metrics. + + > **Note:** + > + > If you are already using Prometheus and Grafana to monitor TiDB Cloud and want to incorporate the newly available metrics, it is recommended that you create a new dashboard instead of directly updating the JSON of the existing one. + 3. (Optional) Customize the dashboard as needed by adding or removing panels, changing data sources, and modifying display options. For more information about how to use Grafana, see [Grafana documentation](https://grafana.com/docs/grafana/latest/getting-started/getting-started-prometheus/). @@ -88,6 +94,15 @@ Prometheus tracks the following metric data for your TiDB clusters. | tidbcloud_node_cpu_capacity_cores | gauge | cluster_name: ``
instance: `tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…`
component: `tidb\|tikv\|tiflash` | The CPU limit cores of TiDB/TiKV/TiFlash nodes | | tidbcloud_node_memory_used_bytes | gauge | cluster_name: ``
instance: `tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…`
component: `tidb\|tikv\|tiflash` | The used memory bytes of TiDB/TiKV/TiFlash nodes | | tidbcloud_node_memory_capacity_bytes | gauge | cluster_name: ``
instance: `tidb-0\|tidb-1…\|tikv-0…\|tiflash-0…`
component: `tidb\|tikv\|tiflash` | The memory capacity bytes of TiDB/TiKV/TiFlash nodes | +| tidbcloud_node_storage_available_bytes | gauge | instance: `tidb-0\|tidb-1\|...`
component: `tikv\|tiflash`
cluster_name: `` | The available disk space in bytes for TiKV/TiFlash nodes | +| tidbcloud_disk_read_latency | histogram | instance: `tidb-0\|tidb-1\|...`
component: `tikv\|tiflash`
cluster_name: ``
`device`: `nvme.*\|dm.*` | The read latency in seconds per storage device | +| tidbcloud_disk_write_latency | histogram | instance: `tidb-0\|tidb-1\|...`
component: `tikv\|tiflash`
cluster_name: ``
`device`: `nvme.*\|dm.*` | The write latency in seconds per storage device | +| tidbcloud_kv_request_duration | histogram | instance: `tidb-0\|tidb-1\|...`
component: `tikv`
cluster_name: ``
`type`: `BatchGet\|Commit\|Prewrite\|...` | The duration in seconds of TiKV requests by type | +| tidbcloud_component_uptime | histogram | instance: `tidb-0\|tidb-1\|...`
component: `tidb\|tikv\|pd\|...`
cluster_name: `` | The uptime in seconds of TiDB components | +| tidbcloud_ticdc_owner_resolved_ts_lag | gauge | changefeed_id: ``
cluster_name: `` | The resolved timestamp lag in seconds for changefeed owner | +| tidbcloud_changefeed_status | gauge | changefeed_id: ``
cluster_name: `` | Changefeed status:
`-1`: Unknown
`0`: Normal
`1`: Warning
`2`: Failed
`3`: Stopped
`4`: Finished
`6`: Warning
`7`: Other | +| tidbcloud_resource_manager_resource_unit_read_request_unit | gauge | cluster_name: ``
resource_group: `` | The read request units consumed by Resource Manager | +| tidbcloud_resource_manager_resource_unit_write_request_unit | gauge | cluster_name: ``
resource_group: `` | The write request units consumed by Resource Manager | ## FAQ