From 022ccc621d334365b7322a779df88dfc87de67ed Mon Sep 17 00:00:00 2001 From: JaySon-Huang Date: Sat, 26 Apr 2025 15:46:49 +0800 Subject: [PATCH 01/11] Update tiflash-configuration and create-tiflash-replicas Signed-off-by: JaySon-Huang --- tiflash/create-tiflash-replicas.md | 8 ++++++ tiflash/tiflash-configuration.md | 45 +++++------------------------- 2 files changed, 15 insertions(+), 38 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index a06113bc914f0..b3c1deb4e18a5 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -160,12 +160,19 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan > tiup ctl:v8.5.0 pd -u http://192.168.1.4:2379 store limit all engine tiflash 60 add-peer > ``` + If there are already a significant number of Regions exist in the old TiFlash nodes in the cluster, and these Regions need to be rebalanced from the old TiFlash nodes to the new ones, the `remove-peer` restriction must also be adjusted accordingly. + + ```shell + tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 remove-peer + ``` + Within a few minutes, you will observe a significant increase in CPU and disk IO resource usage of the TiFlash nodes, and TiFlash should create replicas faster. At the same time, the TiKV nodes' CPU and disk IO resource usage increases as well. If the TiKV and TiFlash nodes still have spare resources at this point and the latency of your online service does not increase significantly, you can further ease the limit, for example, triple the original speed: ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 90 add-peer + tiup ctl:v pd -u http://:2379 store limit all engine tiflash 90 remove-peer ``` 3. After the TiFlash replication is complete, revert to the default configuration to reduce the impact on online services. @@ -174,6 +181,7 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 30 add-peer + tiup ctl:v pd -u http://:2379 store limit all engine tiflash 30 remove-peer ``` Execute the following SQL statements to restore the default snapshot write speed limit: diff --git a/tiflash/tiflash-configuration.md b/tiflash/tiflash-configuration.md index 65f3a812d5de1..ef57b95b12b2b 100644 --- a/tiflash/tiflash-configuration.md +++ b/tiflash/tiflash-configuration.md @@ -8,28 +8,6 @@ aliases: ['/docs/dev/tiflash/tiflash-configuration/','/docs/dev/reference/tiflas This document introduces the configuration parameters related to the deployment and use of TiFlash. -## PD scheduling parameters - -You can adjust the PD scheduling parameters using [pd-ctl](/pd-control.md). Note that you can use `tiup ctl:v pd` to replace `pd-ctl -u ` when using tiup to deploy and manage your cluster. - -- [`replica-schedule-limit`](/pd-configuration-file.md#replica-schedule-limit): determines the rate at which the replica-related operator is generated. The parameter affects operations such as making nodes offline and add replicas. - - > **Note:** - > - > The value of this parameter should be less than that of `region-schedule-limit`. Otherwise, the normal Region scheduling among TiKV nodes is affected. - -- `store-balance-rate`: limits the rate at which Regions of each TiKV/TiFlash store are scheduled. Note that this parameter takes effect only when the stores have newly joined the cluster. If you want to change the setting for existing stores, use the following command. - - > **Note:** - > - > Since v4.0.2, the `store-balance-rate` parameter has been deprecated and changes have been made to the `store limit` command. See [store-limit](/configure-store-limit.md) for details. - - - Execute the `pd-ctl -u store limit ` command to set the scheduling rate of a specified store. To get `store_id`, you can execute the `pd-ctl -u store` command. - - If you do not set the scheduling rate for Regions of a specified store, this store inherits the setting of `store-balance-rate`. - - You can execute the `pd-ctl -u store limit` command to view the current setting value of `store-balance-rate`. - -- [`replication.location-labels`](/pd-configuration-file.md#location-labels): indicates the topological relationship of TiKV instances. The order of the keys indicates the layering relationship of different labels. If TiFlash is enabled, you need to use [`pd-ctl config placement-rules`](/pd-control.md#config-show--set-option-value--placement-rules) to set the default value. For details, see [geo-distributed-deployment-topology](/geo-distributed-deployment-topology.md). - ## TiFlash configuration parameters This section introduces the configuration parameters of TiFlash. @@ -383,7 +361,7 @@ Note that the following parameters only take effect in TiFlash logs and TiFlash - The memory usage limit for the generated intermediate data in all queries. - When the value is an integer, the unit is byte. For example, `34359738368` means 32 GiB of memory limit, and `0` means no limit. -- When the value is a floating-point number in the range of `[0.0, 1.0)`, it means the ratio of the allowed memory usage to the total memory of the node. For example, `0.8` means 80% of the total memory, and `0.0` means no limit. +- You can set the value to a floating-point number in the range of `[0.0, 1.0)` since v6.6.0. A floating-point number means the ratio of the allowed memory usage to the total memory of the node. For example, `0.8` means 80% of the total memory, and `0.0` means no limit. - When the queries attempt to consume memory that exceeds this limit, the queries are terminated and an error is reported. - Default value: `0.8`, which means 80% of the total memory. @@ -593,27 +571,18 @@ The parameters in `tiflash-learner.toml` are basically the same as those in TiKV - Specifies the old master key when rotating the new master key. The configuration format is the same as that of `master-key`. To learn how to configure a master key, see [Configure encryption](/encryption-at-rest.md#configure-encryption). -### Schedule replicas by topology labels +#### server + +##### `labels` -See [Set available zones](/tiflash/create-tiflash-replicas.md#set-available-zones). +- Specifies server attributes, such as `{ zone = "us-west-1", disk = "ssd" }`. You can checkout [Set available zones](/tiflash/create-tiflash-replicas.md#set-available-zones) to learn how to schedule replicas using labels. +- Default value: `{}` ### Multi-disk deployment TiFlash supports multi-disk deployment. If there are multiple disks in your TiFlash node, you can make full use of those disks by configuring the parameters described in the following sections. For TiFlash's configuration template to be used for TiUP, see [The complex template for the TiFlash topology](https://github.com/pingcap/docs/blob/master/config-templates/complex-tiflash.yaml). -#### Multi-disk deployment with TiDB version earlier than v4.0.9 - -For TiDB clusters earlier than v4.0.9, TiFlash only supports storing the main data of the storage engine on multiple disks. You can set up a TiFlash node on multiple disks by specifying the `path` (`data_dir` in TiUP) and `path_realtime_mode` configuration. - -If there are multiple data storage directories in `path`, separate each with a comma. For example, `/nvme_ssd_a/data/tiflash,/sata_ssd_b/data/tiflash,/sata_ssd_c/data/tiflash`. If there are multiple disks in your environment, it is recommended that each directory corresponds to one disk and you put disks with the best performance at the front to maximize the performance of all disks. - -If there are multiple disks with similar I/O metrics on your TiFlash node, you can leave the `path_realtime_mode` parameter to the default value (or you can explicitly set it to `false`). It means that data will be evenly distributed among all storage directories. However, the latest data is written only to the first directory, so the corresponding disk is busier than other disks. - -If there are multiple disks with different I/O metrics on your TiFlash node, it is recommended to set `path_realtime_mode` to `true` and put disks with the best I/O metrics at the front of `path`. It means that the first directory only stores the latest data, and the older data are evenly distributed among the other directories. Note that in this case, the capacity of the first directory should be planned as 10% of the total capacity of all directories. - -#### Multi-disk deployment with TiDB v4.0.9 or later - -For TiDB clusters with v4.0.9 or later versions, TiFlash supports storing the main data and the latest data of the storage engine on multiple disks. If you want to deploy a TiFlash node on multiple disks, it is recommended to specify your storage directories in the `[storage]` section to make full use of your node. Note that the configurations earlier than v4.0.9 (`path` and `path_realtime_mode`) are still supported. +For TiDB clusters with v4.0.9 or later versions, TiFlash supports storing the main data and the latest data of the storage engine on multiple disks. If you want to deploy a TiFlash node on multiple disks, it is recommended to specify your storage directories in the `[storage]` section to make full use of your node. If there are multiple disks with similar I/O metrics on your TiFlash node, it is recommended to specify corresponding directories in the `storage.main.dir` list and leave `storage.latest.dir` empty. TiFlash will distribute I/O pressure and data among all directories. From 8be43418776dff24d0b84b9ded5fa7c928798c27 Mon Sep 17 00:00:00 2001 From: JaySon-Huang Date: Sat, 26 Apr 2025 16:47:15 +0800 Subject: [PATCH 02/11] Update FAQ for tiflash Signed-off-by: JaySon-Huang --- tiflash/troubleshoot-tiflash.md | 136 ++++++++++++++------------------ 1 file changed, 60 insertions(+), 76 deletions(-) diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index f8fc5a93d37df..076caa4080065 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -32,47 +32,13 @@ The issue might occur due to different reasons. It is recommended that you troub 3. Use the PD Control tool to check whether there is any TiFlash instance that failed to go offline on the node (same IP and Port) and force the instance(s) to go offline. For detailed steps, refer to [Scale in a TiFlash cluster](/scale-tidb-using-tiup.md#scale-in-a-tiflash-cluster). -If the above methods cannot resolve your issue, save the TiFlash log files and [get support](/support.md) from PingCAP or the community. +4. Check whether the system CPU supports vector extension instruction sets -## TiFlash replica is always unavailable - -This is because TiFlash is in an abnormal state caused by configuration errors or environment issues. Take the following steps to identify the faulty component: - -1. Check whether PD enables the `Placement Rules` feature: - - {{< copyable "shell-regular" >}} - - ```shell - echo 'config show replication' | /path/to/pd-ctl -u http://${pd-ip}:${pd-port} - ``` - - - If `true` is returned, go to the next step. - - If `false` is returned, [enable the Placement Rules feature](/configure-placement-rules.md#enable-placement-rules) and go to the next step. - -2. Check whether the TiFlash process is working correctly by viewing `UpTime` on the TiFlash-Summary monitoring panel. - -3. Check whether the TiFlash proxy status is normal through `pd-ctl`. + Starting from v6.3, to deploy TiFlash under the Linux AMD64 architecture, the CPU must support the AVX2 instruction set. Ensure that `grep avx2 /proc/cpuinfo` has output. To deploy TiFlash under the Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Ensure that `grep 'crc32' /proc/cpuinfo | grep 'asimd'` has output. - ```shell - tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} store - ``` + If deploying on a virtual machine, change the virtual machine's CPU architecture to "Haswell". - The TiFlash proxy's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash proxy. - -4. Check whether the number of configured replicas is less than or equal to the number of TiKV nodes in the cluster. If not, PD cannot replicate data to TiFlash. - - ```shell - tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config placement-rules show | grep -C 10 default - ``` - - Reconfirm the value of `default: count`. - - > **Note:** - > - > - When [Placement Rules](/configure-placement-rules.md) are enabled and multiple rules exist, the previously configured [`max-replicas`](/pd-configuration-file.md#max-replicas), [`location-labels`](/pd-configuration-file.md#location-labels), and [`isolation-level`](/pd-configuration-file.md#isolation-level) no longer take effect. To adjust the replica policy, use the interface related to Placement Rules. - > - When [Placement Rules](/configure-placement-rules.md) are enabled and only one default rule exists, TiDB will automatically update this default rule when `max-replicas`, `location-labels`, or `isolation-level` configurations are changed. - -5. Check whether the remaining disk space of the machine (where `store` of the TiFlash node is) is sufficient. By default, when the remaining disk space is less than 20% of the `store` capacity (which is controlled by the [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) parameter), PD cannot schedule data to this TiFlash node. +If the above methods cannot resolve your issue, collect the TiFlash log files and [get support](/support.md) from PingCAP or the community. ## Some queries return the `Region Unavailable` error @@ -166,43 +132,45 @@ In this example, the warning message shows that TiDB does not select the MPP mod ``` +---------+------+-----------------------------------------------------------------------------+ -> | Level | Code | Message | +| Level | Code | Message | +---------+------+-----------------------------------------------------------------------------+ | Warning | 1105 | Scalar function 'subtime'(signature: SubDatetimeAndString, return type: datetime) is not supported to push down to tiflash now. | +---------+------+-----------------------------------------------------------------------------+ ``` -## Data is not replicated to TiFlash - -After deploying a TiFlash node and starting replication (by performing the ALTER operation), no data is replicated to it. In this case, you can identify and address the problem by following the steps below: +## TiFlash replica is always unavailable -1. Check whether the replication is successful by running the `ALTER table set tiflash replica ` command and check the output. +If TiFlash replicas consistently fail to be created since the TiDB cluster is deployed, or if the TiFlash replicas were initially created normally but then all or some tables fails to be created after a period of time, you can diagnose and resolve the issue by performing the following steps: - - If there is output, go to the next step. - - If there is no output, run the `SELECT * FROM information_schema.tiflash_replica` command to check whether TiFlash replicas have been created. If not, run the `ALTER table ${tbl_name} set tiflash replica ${num}` command again, check whether other statements (for example, `add index`) have been executed, or check whether DDL executions are successful. +1. Check whether PD enables the `Placement Rules` feature. This feature is enabled by default since v6.5.0: -2. Check whether TiFlash Region replication runs correctly. + {{< copyable "shell-regular" >}} - Check whether there is any change in `progress`: + ```shell + echo 'config show replication' | /path/to/pd-ctl -u http://${pd-ip}:${pd-port} + ``` - - If yes, TiFlash replication runs correctly. - - If no, TiFlash replication is abnormal. In `tidb.log`, search the log saying `Tiflash replica is not available`. Check whether `progress` of the corresponding table is updated. If not, check the `tiflash log` for further information. For example, search `lag_region_info` in `tiflash log` to find out which Region lags behind. + - If `true` is returned, go to the next step. + - If `false` is returned, [enable the Placement Rules feature](/configure-placement-rules.md#enable-placement-rules) and go to the next step. -3. Check whether the [Placement Rules](/configure-placement-rules.md) function has been enabled by using pd-ctl: +2. Check whether the TiFlash process is working correctly by viewing `UpTime` on the TiFlash-Summary monitoring panel. - {{< copyable "shell-regular" >}} +3. Check whether the connection between TiFlash and PD is normal through `pd-ctl`. ```shell - echo 'config show replication' | /path/to/pd-ctl -u http://: + tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} store ``` - - If `true` is returned, go to the next step. - - If `false` is returned, [enable the Placement Rules feature](/configure-placement-rules.md#enable-placement-rules) and go to the next step. + The TiFlash's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash instance. -4. Check whether the `max-replicas` configuration is correct: +4. Check whether the `count` of Placement Rule with id `default` is correct: - - If the value of `max-replicas` does not exceed the number of TiKV nodes in the cluster, go to the next step. - - If the value of `max-replicas` is greater than the number of TiKV nodes in the cluster, the PD does not replicate data to the TiFlash node. To address this issue, change `max-replicas` to an integer fewer than or equal to the number of TiKV nodes in the cluster. + ```shell + tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config placement-rules show | grep -C 10 default + ``` + + - If the value of `count` does not exceed the number of TiKV nodes in the cluster, go to the next step. + - If the value of `count` is greater than the number of TiKV nodes in the cluster, the PD does not replicate data to the TiFlash node. To address this issue, change `count` to an integer fewer than or equal to the number of TiKV nodes in the cluster. > **Note:** > @@ -224,44 +192,60 @@ After deploying a TiFlash node and starting replication (by performing the ALTER }' ``` -5. Check whether TiDB has created any placement rule for tables. +5. Check whether the remaining disk space percentage on the machine where TiFlash nodes reside is higher than the [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) value. The default value is 0.8, meaning when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (progress < 1). - Search the logs of TiDB DDL Owner and check whether TiDB has notified PD to add placement rules. For non-partitioned tables, search `ConfigureTiFlashPDForTable`. For partitioned tables, search `ConfigureTiFlashPDForPartitions`. + - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, please delete unnecessary files such as the `space_placeholder_file` under the `${data}/flash/` directory. If necessary, after deleting files, you may temporarily set `storage.reserve-space` to 0MB in the tiflash-learner.toml configuration file to restore TiFlash service. + - If the disk usage is below `low-space-ratio`, it indicates normal disk space availability. Proceed to the next step. - - If the keyword is found, go to the next step. - - If not, collect logs of the corresponding component for troubleshooting. +6. Check whether there is any `down peer`. Any `down peer` might cause the replication to get stuck. -6. Check whether PD has configured any placement rule for tables. + Run the `pd-ctl region check-down-peer` command to check whether there is any `down peer`. If any, run the `pd-ctl operator add remove-peer ` command to remove it. - Run the `curl http://:/pd/api/v1/config/rules/group/tiflash` command to view all TiFlash placement rules on the current PD. If a rule with the ID being `table--r` is found, the PD has configured a placement rule successfully. +If none of the above configurations or TiFlash status show abnormalities, please follow the "Data is not replicated to TiFlash" guide below to identify which component or data syncing process is experiencing issues. -7. Check whether the PD schedules properly. +## Data is not replicated to TiFlash - Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. +After deploying a TiFlash node and starting replication (by performing the ALTER operation), no data is replicated to it. In this case, you can identify and address the problem by following the steps below: - - If the keyword is found, the PD schedules properly. - - If not, the PD does not schedule properly. +1. Check whether the replication is successful by running the `ALTER table set tiflash replica ` command and check the output. -## Data replication gets stuck + - If there is output, go to the next step. + - If there is no output, run the `SELECT * FROM information_schema.tiflash_replica` command to check whether TiFlash replicas have been created. If not, run the `ALTER table ${tbl_name} set tiflash replica ${num}` command again + - Check whether the DDL statement is executed as expected through [ADMIN SHOW DDL](/sql-statements/sql-statement-admin-show-ddl.md). Or there are any other DDL statement that block altering TiFlash replica statement being executed. + - Check whether any DML statement is executed on the same table through [SHOW PROCESSLIST](/sql-statements/sql-statement-show-processlist.md) that blocks altering TiFlash replica statement being executed. -If data replication on TiFlash starts normally but then all or some data fails to be replicated after a period of time, you can confirm or resolve the issue by performing the following steps: +2. Check whether TiFlash Region replication runs correctly. -1. Check the disk space. + Check whether there is any change in `progress`: - Check whether the disk space ratio is higher than the value of `low-space-ratio` (defaulted to 0.8. When the space usage of a node exceeds 80%, the PD stops migrating data to this node to avoid exhaustion of disk space). + - If changes are detected, it indicates TiFlash replication is functioning normally (though potentially at a slower pace). Please refer to the "Data replication is slow" section for optimization configurations. + - If no, TiFlash replication is abnormal. In `tidb.log`, search the log saying `Tiflash replica is not available`. Check whether `progress` of the corresponding table is updated. If not, go to the next step. - - If the disk usage ratio is greater than or equal to the value of `low-space-ratio`, the disk space is insufficient. To relieve the disk space, remove unnecessary files, such as `space_placeholder_file` (if necessary, set `reserve-space` to 0MB after removing the file) under the `${data}/flash/` folder. - - If the disk usage ratio is less than the value of `low-space-ratio`, the disk space is sufficient. Go to the next step. +3. Check whether TiDB has created any placement rule for tables. -2. Check whether there is any `down peer` (a `down peer` might cause the replication to get stuck). + Search the logs of TiDB DDL Owner and check whether TiDB has notified PD to add placement rules. For non-partitioned tables, search `ConfigureTiFlashPDForTable`. For partitioned tables, search `ConfigureTiFlashPDForPartitions`. - Run the `pd-ctl region check-down-peer` command to check whether there is any `down peer`. If any, run the `pd-ctl operator add remove-peer ` command to remove it. + - If the keyword is found, go to the next step. + - If not, collect logs of the corresponding component for troubleshooting. + +4. Check whether PD has configured any placement rule for tables. + + Run the `curl http://:/pd/api/v1/config/rules/group/tiflash` command to view all TiFlash placement rules on the current PD. If a rule with the ID being `table--r` is found, the PD has configured a placement rule successfully. + +5. Check whether the PD schedules properly. + + Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. + + - If the keyword is found, the PD schedules properly. + - If not, the PD does not schedule properly. + +If the above methods cannot resolve your issue, collect the TiDB, PD, TiFlash log files and [get support](/support.md) from PingCAP or the community. ## Data replication is slow The causes may vary. You can address the problem by performing the following steps. -1. Increase [`store limit`](/configure-store-limit.md#usage) to accelerate replication. +1. Follow the [Speed up TiFlash replication](/tiflash/create-tiflash-replicas.md#speed-up-tiflash-replication) to accelerate replication. 2. Adjust the load on TiFlash. From aee9901a7874c4967615d4416bbb31348fee7f8b Mon Sep 17 00:00:00 2001 From: JaySon-Huang Date: Sat, 26 Apr 2025 16:58:33 +0800 Subject: [PATCH 03/11] Address comment from gemini Signed-off-by: JaySon-Huang --- tiflash/create-tiflash-replicas.md | 2 +- tiflash/tiflash-configuration.md | 2 +- tiflash/tiflash-overview.md | 2 +- tiflash/troubleshoot-tiflash.md | 8 ++++---- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index b3c1deb4e18a5..d9968f0fb85bf 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -160,7 +160,7 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan > tiup ctl:v8.5.0 pd -u http://192.168.1.4:2379 store limit all engine tiflash 60 add-peer > ``` - If there are already a significant number of Regions exist in the old TiFlash nodes in the cluster, and these Regions need to be rebalanced from the old TiFlash nodes to the new ones, the `remove-peer` restriction must also be adjusted accordingly. + If a significant number of Regions already exist in the old TiFlash nodes and need rebalancing to the new nodes, adjust the `remove-peer` restriction accordingly. ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 remove-peer diff --git a/tiflash/tiflash-configuration.md b/tiflash/tiflash-configuration.md index ef57b95b12b2b..3100fdac893d4 100644 --- a/tiflash/tiflash-configuration.md +++ b/tiflash/tiflash-configuration.md @@ -361,7 +361,7 @@ Note that the following parameters only take effect in TiFlash logs and TiFlash - The memory usage limit for the generated intermediate data in all queries. - When the value is an integer, the unit is byte. For example, `34359738368` means 32 GiB of memory limit, and `0` means no limit. -- You can set the value to a floating-point number in the range of `[0.0, 1.0)` since v6.6.0. A floating-point number means the ratio of the allowed memory usage to the total memory of the node. For example, `0.8` means 80% of the total memory, and `0.0` means no limit. +- Since v6.6.0, you can set the value to a floating-point number in the range of `[0.0, 1.0)`. This number represents the ratio of allowed memory usage to the total node memory. For example, `0.8` means 80% of the total memory, and `0.0` means no limit. - When the queries attempt to consume memory that exceeds this limit, the queries are terminated and an error is reported. - Default value: `0.8`, which means 80% of the total memory. diff --git a/tiflash/tiflash-overview.md b/tiflash/tiflash-overview.md index c7e5a8e06cfbb..24b0b2342cb6f 100644 --- a/tiflash/tiflash-overview.md +++ b/tiflash/tiflash-overview.md @@ -26,7 +26,7 @@ TiFlash provides the columnar storage, with a layer of coprocessors efficiently TiFlash conducts real-time replication of data in the TiKV nodes at a low cost that does not block writes in TiKV. Meanwhile, it provides the same read consistency as in TiKV and ensures that the latest data is read. The Region replica in TiFlash is logically identical to those in TiKV, and is split and merged along with the Leader replica in TiKV at the same time. -To deploy TiFlash under the Linux AMD64 architecture, the CPU must support the AVX2 instruction set. Ensure that `grep avx2 /proc/cpuinfo` has output. To deploy TiFlash under the Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Ensure that `grep 'crc32' /proc/cpuinfo | grep 'asimd'` has output. By using the instruction set extensions, TiFlash's vectorization engine can deliver better performance. +Deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the AVX2 instruction set. Verify this by ensuring `grep avx2 /proc/cpuinfo` produces output. For Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Verify this by ensuring `grep 'crc32' /proc/cpuinfo | grep 'asimd'` produces output. By using the instruction set extensions, TiFlash's vectorization engine can deliver better performance. diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index 076caa4080065..ace8a6fbb4c73 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -32,9 +32,9 @@ The issue might occur due to different reasons. It is recommended that you troub 3. Use the PD Control tool to check whether there is any TiFlash instance that failed to go offline on the node (same IP and Port) and force the instance(s) to go offline. For detailed steps, refer to [Scale in a TiFlash cluster](/scale-tidb-using-tiup.md#scale-in-a-tiflash-cluster). -4. Check whether the system CPU supports vector extension instruction sets +4. Check whether the CPU supports SIMD instructions - Starting from v6.3, to deploy TiFlash under the Linux AMD64 architecture, the CPU must support the AVX2 instruction set. Ensure that `grep avx2 /proc/cpuinfo` has output. To deploy TiFlash under the Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Ensure that `grep 'crc32' /proc/cpuinfo | grep 'asimd'` has output. + Starting with v6.3, deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the AVX2 instruction set. Verify this by ensuring `grep avx2 /proc/cpuinfo` produces output. For Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Verify this by ensuring `grep 'crc32' /proc/cpuinfo | grep 'asimd'` produces output. If deploying on a virtual machine, change the virtual machine's CPU architecture to "Haswell". @@ -142,7 +142,7 @@ In this example, the warning message shows that TiDB does not select the MPP mod If TiFlash replicas consistently fail to be created since the TiDB cluster is deployed, or if the TiFlash replicas were initially created normally but then all or some tables fails to be created after a period of time, you can diagnose and resolve the issue by performing the following steps: -1. Check whether PD enables the `Placement Rules` feature. This feature is enabled by default since v6.5.0: +1. Check whether PD enables the `Placement Rules` feature. This feature is enabled by default since v5.0: {{< copyable "shell-regular" >}} @@ -194,7 +194,7 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de 5. Check whether the remaining disk space percentage on the machine where TiFlash nodes reside is higher than the [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) value. The default value is 0.8, meaning when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (progress < 1). - - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, please delete unnecessary files such as the `space_placeholder_file` under the `${data}/flash/` directory. If necessary, after deleting files, you may temporarily set `storage.reserve-space` to 0MB in the tiflash-learner.toml configuration file to restore TiFlash service. + - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, please delete unnecessary files such as the `space_placeholder_file` under the `${data}/flash/` directory. If necessary, after deleting files, you may temporarily set `storage.reserve-space` to `0MB` in the tiflash-learner.toml configuration file to restore TiFlash service. - If the disk usage is below `low-space-ratio`, it indicates normal disk space availability. Proceed to the next step. 6. Check whether there is any `down peer`. Any `down peer` might cause the replication to get stuck. From fd781b7b1d3034c54169741c46ddb54f916a0362 Mon Sep 17 00:00:00 2001 From: JaySon-Huang Date: Sat, 26 Apr 2025 17:22:05 +0800 Subject: [PATCH 04/11] Polish the doc Signed-off-by: JaySon-Huang --- tiflash/troubleshoot-tiflash.md | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index ace8a6fbb4c73..7a6cc01feb173 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -153,9 +153,9 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de - If `true` is returned, go to the next step. - If `false` is returned, [enable the Placement Rules feature](/configure-placement-rules.md#enable-placement-rules) and go to the next step. -2. Check whether the TiFlash process is working correctly by viewing `UpTime` on the TiFlash-Summary monitoring panel. +2. Check whether the TiFlash process is working normally by the `UpTime` on the TiFlash-Summary Grafana panel. -3. Check whether the connection between TiFlash and PD is normal through `pd-ctl`. +3. Check whether the connection between TiFlash and PD is normal. ```shell tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} store @@ -170,11 +170,11 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de ``` - If the value of `count` does not exceed the number of TiKV nodes in the cluster, go to the next step. - - If the value of `count` is greater than the number of TiKV nodes in the cluster, the PD does not replicate data to the TiFlash node. To address this issue, change `count` to an integer fewer than or equal to the number of TiKV nodes in the cluster. + - If the value of `count` is greater than the number of TiKV nodes in the cluster. For example, if there are only 1 TiKV nodes in a testing cluster while the count is 3, then PD will not add any Region peer to the TiFlash node. To address this issue, change `count` to an integer fewer than or equal to the number of TiKV nodes in the cluster. > **Note:** > - > `max-replicas` is defaulted to 3. In production environments, the value is usually fewer than the number of TiKV nodes. In test environments, the value can be 1. + > `count` is defaulted to 3. In production environments, the value is usually fewer than the number of TiKV nodes. In test environments, the value can be 1. {{< copyable "shell-regular" >}} @@ -194,7 +194,18 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de 5. Check whether the remaining disk space percentage on the machine where TiFlash nodes reside is higher than the [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) value. The default value is 0.8, meaning when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (progress < 1). - - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, please delete unnecessary files such as the `space_placeholder_file` under the `${data}/flash/` directory. If necessary, after deleting files, you may temporarily set `storage.reserve-space` to `0MB` in the tiflash-learner.toml configuration file to restore TiFlash service. + - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, one or more of the following actions can be taken: + + - Modify the value of `low-space-ratio` to allow the PD to resume scheduling Regions to the TiFlash node. + + ``` + tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config set low-space-ratio 0.9 + ``` + + - Scale-out new TiFlash nodes, PD will balance Regions across TiFlash nodes and resumes scheduling Regions to TiFlash nodes with enough disk space. + + - Remove unnecessary files from the TiFlash node disk, such as the `space_placeholder_file` file in the `${data}/flash/` directory. If necessary, set `storage.reserve-space` in tiflash-learner.toml to `0MB` at the same time to temporarily bring TiFlash back into service. + - If the disk usage is below `low-space-ratio`, it indicates normal disk space availability. Proceed to the next step. 6. Check whether there is any `down peer`. Any `down peer` might cause the replication to get stuck. @@ -205,7 +216,7 @@ If none of the above configurations or TiFlash status show abnormalities, please ## Data is not replicated to TiFlash -After deploying a TiFlash node and starting replication (by performing the ALTER operation), no data is replicated to it. In this case, you can identify and address the problem by following the steps below: +After deploying a TiFlash node and starting replication by executing `ALTER TABLE ... SET TIFLASH REPLICA ...`, no data is replicated to it. In this case, you can identify and address the problem by following the steps below: 1. Check whether the replication is successful by running the `ALTER table set tiflash replica ` command and check the output. @@ -213,6 +224,7 @@ After deploying a TiFlash node and starting replication (by performing the ALTER - If there is no output, run the `SELECT * FROM information_schema.tiflash_replica` command to check whether TiFlash replicas have been created. If not, run the `ALTER table ${tbl_name} set tiflash replica ${num}` command again - Check whether the DDL statement is executed as expected through [ADMIN SHOW DDL](/sql-statements/sql-statement-admin-show-ddl.md). Or there are any other DDL statement that block altering TiFlash replica statement being executed. - Check whether any DML statement is executed on the same table through [SHOW PROCESSLIST](/sql-statements/sql-statement-show-processlist.md) that blocks altering TiFlash replica statement being executed. + - If nothing is blocking the `ALTER TABLE ... SET TIFLASH REPLICA ...` being executed, go to the next step. 2. Check whether TiFlash Region replication runs correctly. @@ -221,7 +233,7 @@ After deploying a TiFlash node and starting replication (by performing the ALTER - If changes are detected, it indicates TiFlash replication is functioning normally (though potentially at a slower pace). Please refer to the "Data replication is slow" section for optimization configurations. - If no, TiFlash replication is abnormal. In `tidb.log`, search the log saying `Tiflash replica is not available`. Check whether `progress` of the corresponding table is updated. If not, go to the next step. -3. Check whether TiDB has created any placement rule for tables. +3. Check whether TiDB has created any placement rule for the table. Search the logs of TiDB DDL Owner and check whether TiDB has notified PD to add placement rules. For non-partitioned tables, search `ConfigureTiFlashPDForTable`. For partitioned tables, search `ConfigureTiFlashPDForPartitions`. @@ -230,11 +242,14 @@ After deploying a TiFlash node and starting replication (by performing the ALTER 4. Check whether PD has configured any placement rule for tables. - Run the `curl http://:/pd/api/v1/config/rules/group/tiflash` command to view all TiFlash placement rules on the current PD. If a rule with the ID being `table--r` is found, the PD has configured a placement rule successfully. + Run the `curl http://:/pd/api/v1/config/rules/group/tiflash` command to view all TiFlash placement rules on the current PD. + + - If a rule with the ID being `table--r` is found, the PD has configured a placement rule successfully, go to the next step. + - If not, collect logs of the corresponding component for troubleshooting. 5. Check whether the PD schedules properly. - Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. + Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. Or check whether there are `add-rule-peer` operator on the "Operator/Schedule operator create" of PD Dashboard on Grafana. - If the keyword is found, the PD schedules properly. - If not, the PD does not schedule properly. From f34ac4113d26c0f02edf37410448f69d06f519d0 Mon Sep 17 00:00:00 2001 From: JaySon-Huang Date: Mon, 12 May 2025 15:35:28 +0800 Subject: [PATCH 05/11] Align address comment with the Chinese version Signed-off-by: JaySon-Huang --- tiflash/create-tiflash-replicas.md | 8 ++++---- tiflash/troubleshoot-tiflash.md | 23 +++++++++++------------ 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index d9968f0fb85bf..6ed24ed646163 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -134,7 +134,7 @@ SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = " -Before TiFlash replicas are added, each TiKV instance performs a full table scan and sends the scanned data to TiFlash as a "snapshot" to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps. +When TiFlash replicas for a table are added, or the Regions' TiFlash replicas being move to another TiFlash instance, the TiKV instance performs a table scan and sends the scanned data to TiFlash as a "snapshot" to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps. 1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](https://docs.pingcap.com/tidb/stable/dynamic-config): @@ -146,9 +146,9 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan After executing these SQL statements, the configuration changes take effect immediately without restarting the cluster. However, since the replication speed is still restricted by the PD limit globally, you cannot observe the acceleration for now. -2. Use [PD Control](https://docs.pingcap.com/tidb/stable/pd-control) to progressively ease the new replica speed limit. +2. Use [PD Control](https://docs.pingcap.com/tidb/stable/pd-control) to progressively ease the replica scheduling speed limit. - The default new replica speed limit is 30, which means, approximately 30 Regions add TiFlash replicas every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: + The default new replica speed limit is 30, which means, approximately 30 Regions add or remove TiFlash replicas on 1 TiFlash store every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 add-peer @@ -177,7 +177,7 @@ Before TiFlash replicas are added, each TiKV instance performs a full table scan 3. After the TiFlash replication is complete, revert to the default configuration to reduce the impact on online services. - Execute the following PD Control command to restore the default new replica speed limit: + Execute the following PD Control command to restore the default replica scheduling speed limit: ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 30 add-peer diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index 7a6cc01feb173..19d269e3651f2 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -174,7 +174,7 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de > **Note:** > - > `count` is defaulted to 3. In production environments, the value is usually fewer than the number of TiKV nodes. In test environments, the value can be 1. + > `count` is defaulted to 3. In production environments, the value is usually fewer than the number of TiKV nodes. In test environments, when you allow there is only 1 Region replica, the value can be 1. {{< copyable "shell-regular" >}} @@ -192,19 +192,19 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de }' ``` -5. Check whether the remaining disk space percentage on the machine where TiFlash nodes reside is higher than the [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) value. The default value is 0.8, meaning when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (progress < 1). +5. Check the remaining disk space percentage on the TiFlash nodes. The default value of [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) is 0.8, meaning when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (progress < 1). - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, one or more of the following actions can be taken: - - Modify the value of `low-space-ratio` to allow the PD to resume scheduling Regions to the TiFlash node. + - Modify the value of `low-space-ratio` to allow the PD to resume scheduling Regions to the TiFlash node until it reach the new threshold. ``` tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config set low-space-ratio 0.9 ``` - - Scale-out new TiFlash nodes, PD will balance Regions across TiFlash nodes and resumes scheduling Regions to TiFlash nodes with enough disk space. + - Scale-out new TiFlash nodes, PD will automatically balance Regions across TiFlash nodes and resumes scheduling Regions to TiFlash nodes with enough disk space. - - Remove unnecessary files from the TiFlash node disk, such as the `space_placeholder_file` file in the `${data}/flash/` directory. If necessary, set `storage.reserve-space` in tiflash-learner.toml to `0MB` at the same time to temporarily bring TiFlash back into service. + - Remove unnecessary files from the TiFlash node disk, such as the logging files, the `space_placeholder_file` file in the `${data}/flash/` directory. If necessary, set `storage.reserve-space` in tiflash-learner.toml to `0MB` at the same time to temporarily bring TiFlash back into service. - If the disk usage is below `low-space-ratio`, it indicates normal disk space availability. Proceed to the next step. @@ -220,18 +220,17 @@ After deploying a TiFlash node and starting replication by executing `ALTER TABL 1. Check whether the replication is successful by running the `ALTER table set tiflash replica ` command and check the output. - - If there is output, go to the next step. - - If there is no output, run the `SELECT * FROM information_schema.tiflash_replica` command to check whether TiFlash replicas have been created. If not, run the `ALTER table ${tbl_name} set tiflash replica ${num}` command again + - If the query is blocked, run the `SELECT * FROM information_schema.tiflash_replica` command to check whether TiFlash replicas have been created. - Check whether the DDL statement is executed as expected through [ADMIN SHOW DDL](/sql-statements/sql-statement-admin-show-ddl.md). Or there are any other DDL statement that block altering TiFlash replica statement being executed. - Check whether any DML statement is executed on the same table through [SHOW PROCESSLIST](/sql-statements/sql-statement-show-processlist.md) that blocks altering TiFlash replica statement being executed. - - If nothing is blocking the `ALTER TABLE ... SET TIFLASH REPLICA ...` being executed, go to the next step. + - You can wait until those queries or DDL finish or cancel them. If nothing is blocking the `ALTER TABLE ... SET TIFLASH REPLICA ...` being executed, go to the next step. 2. Check whether TiFlash Region replication runs correctly. - Check whether there is any change in `progress`: + Check whether there is any change in `progress` of `information_schema.tiflash_replica`. Or you can check the `progress` field with keyword `Tiflash replica is not available` in TiDB logging: - If changes are detected, it indicates TiFlash replication is functioning normally (though potentially at a slower pace). Please refer to the "Data replication is slow" section for optimization configurations. - - If no, TiFlash replication is abnormal. In `tidb.log`, search the log saying `Tiflash replica is not available`. Check whether `progress` of the corresponding table is updated. If not, go to the next step. + - If no, TiFlash replication is abnormal, go to the next step. 3. Check whether TiDB has created any placement rule for the table. @@ -249,10 +248,10 @@ After deploying a TiFlash node and starting replication by executing `ALTER TABL 5. Check whether the PD schedules properly. - Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. Or check whether there are `add-rule-peer` operator on the "Operator/Schedule operator create" of PD Dashboard on Grafana. + Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. Or check whether there are `add-rule-peer` operator on the "Operator/Schedule operator create" of PD Dashboard on Grafana. You can also check the value "Scheduler/Patrol Region time" of PD Dashboard on Grafana. "Patrol Region time" reflects the duration for PD to scan all Regions and generate scheduling operations. A high value may cause delays in scheduling. - If the keyword is found, the PD schedules properly. - - If not, the PD does not schedule properly. + - If no scheduling operations is generated, or the "Patrol Region time" is more than 30 minutes, the PD does not schedule properly or is scheduling slowly. If the above methods cannot resolve your issue, collect the TiDB, PD, TiFlash log files and [get support](/support.md) from PingCAP or the community. From 0518a1ba3e0c2e47de69b360c63a3348ebeee0b7 Mon Sep 17 00:00:00 2001 From: JaySon Date: Tue, 13 May 2025 13:28:10 +0800 Subject: [PATCH 06/11] Apply suggestions from code review Co-authored-by: xixirangrang --- tiflash/create-tiflash-replicas.md | 2 +- tiflash/tiflash-configuration.md | 6 ++-- tiflash/troubleshoot-tiflash.md | 52 +++++++++++++++--------------- 3 files changed, 30 insertions(+), 30 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index 6ed24ed646163..31b31bc71ad26 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -148,7 +148,7 @@ When TiFlash replicas for a table are added, or the Regions' TiFlash replicas be 2. Use [PD Control](https://docs.pingcap.com/tidb/stable/pd-control) to progressively ease the replica scheduling speed limit. - The default new replica speed limit is 30, which means, approximately 30 Regions add or remove TiFlash replicas on 1 TiFlash store every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: + The default new replica speed limit is 30, which means, approximately 30 Regions add or remove TiFlash replicas on one TiFlash instance every minute. Executing the following command will adjust the limit to 60 for all TiFlash instances, which doubles the original speed: ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 add-peer diff --git a/tiflash/tiflash-configuration.md b/tiflash/tiflash-configuration.md index 3100fdac893d4..456f35d4fb4a6 100644 --- a/tiflash/tiflash-configuration.md +++ b/tiflash/tiflash-configuration.md @@ -361,7 +361,7 @@ Note that the following parameters only take effect in TiFlash logs and TiFlash - The memory usage limit for the generated intermediate data in all queries. - When the value is an integer, the unit is byte. For example, `34359738368` means 32 GiB of memory limit, and `0` means no limit. -- Since v6.6.0, you can set the value to a floating-point number in the range of `[0.0, 1.0)`. This number represents the ratio of allowed memory usage to the total node memory. For example, `0.8` means 80% of the total memory, and `0.0` means no limit. +- Starting from v6.6.0, you can set the value to a floating-point number in the range of `[0.0, 1.0)`. This number represents the ratio of the allowed memory usage to the total node memory. For example, `0.8` means 80% of the total memory, and `0.0` means no limit. - When the queries attempt to consume memory that exceeds this limit, the queries are terminated and an error is reported. - Default value: `0.8`, which means 80% of the total memory. @@ -575,14 +575,14 @@ The parameters in `tiflash-learner.toml` are basically the same as those in TiKV ##### `labels` -- Specifies server attributes, such as `{ zone = "us-west-1", disk = "ssd" }`. You can checkout [Set available zones](/tiflash/create-tiflash-replicas.md#set-available-zones) to learn how to schedule replicas using labels. +- Specifies server attributes, such as `{ zone = "us-west-1", disk = "ssd" }`. For more information about how to schedule replicas using labels, see [Set available zones](/tiflash/create-tiflash-replicas.md#set-available-zones). - Default value: `{}` ### Multi-disk deployment TiFlash supports multi-disk deployment. If there are multiple disks in your TiFlash node, you can make full use of those disks by configuring the parameters described in the following sections. For TiFlash's configuration template to be used for TiUP, see [The complex template for the TiFlash topology](https://github.com/pingcap/docs/blob/master/config-templates/complex-tiflash.yaml). -For TiDB clusters with v4.0.9 or later versions, TiFlash supports storing the main data and the latest data of the storage engine on multiple disks. If you want to deploy a TiFlash node on multiple disks, it is recommended to specify your storage directories in the `[storage]` section to make full use of your node. +For TiDB clusters with v4.0.9 or later versions, TiFlash supports storing the main data and the latest data of the storage engine on multiple disks. If you want to deploy a TiFlash node on multiple disks, it is recommended to specify your storage directories in the `[storage]` section to make full use of the I/O performance of your node. If there are multiple disks with similar I/O metrics on your TiFlash node, it is recommended to specify corresponding directories in the `storage.main.dir` list and leave `storage.latest.dir` empty. TiFlash will distribute I/O pressure and data among all directories. diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index 19d269e3651f2..10fcf1cfd38ab 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -34,11 +34,11 @@ The issue might occur due to different reasons. It is recommended that you troub 4. Check whether the CPU supports SIMD instructions - Starting with v6.3, deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the AVX2 instruction set. Verify this by ensuring `grep avx2 /proc/cpuinfo` produces output. For Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Verify this by ensuring `grep 'crc32' /proc/cpuinfo | grep 'asimd'` produces output. + Starting from v6.3, deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the AVX2 instruction set. Verify this by ensuring that `grep avx2 /proc/cpuinfo` produces output. For Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Verify this by ensuring that `grep 'crc32' /proc/cpuinfo | grep 'asimd'` produces output. - If deploying on a virtual machine, change the virtual machine's CPU architecture to "Haswell". + If you encounter this issue when deploying on a virtual machine, try changing the VM's CPU architecture to "Haswell" and then redeploy TiFlash. -If the above methods cannot resolve your issue, collect the TiFlash log files and [get support](/support.md) from PingCAP or the community. +If the preceding methods cannot resolve your issue, collect the TiFlash log files and [get support](/support.md) from PingCAP or the community. ## Some queries return the `Region Unavailable` error @@ -163,18 +163,18 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de The TiFlash's `store.labels` includes information such as `{"key": "engine", "value": "tiflash"}`. You can check this information to confirm a TiFlash instance. -4. Check whether the `count` of Placement Rule with id `default` is correct: +4. Check whether the `count` of Placement Rule with the `default` ID is correct: ```shell tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config placement-rules show | grep -C 10 default ``` - If the value of `count` does not exceed the number of TiKV nodes in the cluster, go to the next step. - - If the value of `count` is greater than the number of TiKV nodes in the cluster. For example, if there are only 1 TiKV nodes in a testing cluster while the count is 3, then PD will not add any Region peer to the TiFlash node. To address this issue, change `count` to an integer fewer than or equal to the number of TiKV nodes in the cluster. + - If the value of `count` is greater than the number of TiKV nodes in the cluster. For example, if there is only one TiKV node in the testing cluster while the count is `3`, then PD will not add any Region peer to the TiFlash node. To address this issue, change `count` to an integer smaller than or equal to the number of TiKV nodes in the cluster. > **Note:** > - > `count` is defaulted to 3. In production environments, the value is usually fewer than the number of TiKV nodes. In test environments, when you allow there is only 1 Region replica, the value can be 1. + > The default value of `count` is `3`. In production environments, the value is usually smaller than the number of TiKV nodes. In test environments, if it is acceptable to have only one Region replica, you can set the value `1`. {{< copyable "shell-regular" >}} @@ -192,72 +192,72 @@ If TiFlash replicas consistently fail to be created since the TiDB cluster is de }' ``` -5. Check the remaining disk space percentage on the TiFlash nodes. The default value of [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) is 0.8, meaning when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (progress < 1). +5. Check the remaining disk space percentage on the TiFlash nodes. The default value of [`low-space-ratio`](/pd-configuration-file.md#low-space-ratio) is `0.8`, meaning that when a node's used space exceeds 80% of its capacity, PD will avoid migrating Regions to that node to prevent disk space exhaustion. If all TiFlash nodes have insufficient remaining space, PD will stop scheduling new Region peers to TiFlash, causing replicas to remain in an unavailable state (that is, progress < 1). - - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, one or more of the following actions can be taken: + - If the disk usage reaches or exceeds `low-space-ratio`, it indicates insufficient disk space. In this case, take one or more of the following actions: - - Modify the value of `low-space-ratio` to allow the PD to resume scheduling Regions to the TiFlash node until it reach the new threshold. + - Modify the value of `low-space-ratio` to allow the PD to resume scheduling Regions to the TiFlash node until it reaches the new threshold. ``` tiup ctl:nightly pd -u http://${pd-ip}:${pd-port} config set low-space-ratio 0.9 ``` - - Scale-out new TiFlash nodes, PD will automatically balance Regions across TiFlash nodes and resumes scheduling Regions to TiFlash nodes with enough disk space. + - Scale out new TiFlash nodes. PD will automatically balance Regions across TiFlash nodes and resume scheduling Regions to TiFlash nodes with enough disk space. - - Remove unnecessary files from the TiFlash node disk, such as the logging files, the `space_placeholder_file` file in the `${data}/flash/` directory. If necessary, set `storage.reserve-space` in tiflash-learner.toml to `0MB` at the same time to temporarily bring TiFlash back into service. + - Remove unnecessary files from the TiFlash node disk, such as the logging files, and the `space_placeholder_file` file in the `${data}/flash/` directory. If necessary, set `storage.reserve-space` in `tiflash-learner.toml` to `0MB` at the same time to temporarily resume TiFlash service. - - If the disk usage is below `low-space-ratio`, it indicates normal disk space availability. Proceed to the next step. + - If the disk usage is less that the value of `low-space-ratio`, it indicates normal disk space availability. Proceed to the next step. -6. Check whether there is any `down peer`. Any `down peer` might cause the replication to get stuck. +6. Check whether there is any `down peer`. Remaining down peers might cause the replication to get stuck. Run the `pd-ctl region check-down-peer` command to check whether there is any `down peer`. If any, run the `pd-ctl operator add remove-peer ` command to remove it. -If none of the above configurations or TiFlash status show abnormalities, please follow the "Data is not replicated to TiFlash" guide below to identify which component or data syncing process is experiencing issues. +If all the preceding configurations or TiFlash status are normal, follow the instructions in [Data is not replicated to TiFlash](#data-is-not-replicated-to-tiflash) to identify which component or data replication process is experiencing issues. ## Data is not replicated to TiFlash -After deploying a TiFlash node and starting replication by executing `ALTER TABLE ... SET TIFLASH REPLICA ...`, no data is replicated to it. In this case, you can identify and address the problem by following the steps below: +After deploying a TiFlash node and starting replication by executing `ALTER TABLE ... SET TIFLASH REPLICA ...`, no data is replicated to it. In this case, you can identify and address the problem by performing the following steps: 1. Check whether the replication is successful by running the `ALTER table set tiflash replica ` command and check the output. - - If the query is blocked, run the `SELECT * FROM information_schema.tiflash_replica` command to check whether TiFlash replicas have been created. + - If the query is blocked, run the `SELECT * FROM information_schema.tiflash_replica` statement to check whether TiFlash replicas have been created. - Check whether the DDL statement is executed as expected through [ADMIN SHOW DDL](/sql-statements/sql-statement-admin-show-ddl.md). Or there are any other DDL statement that block altering TiFlash replica statement being executed. - Check whether any DML statement is executed on the same table through [SHOW PROCESSLIST](/sql-statements/sql-statement-show-processlist.md) that blocks altering TiFlash replica statement being executed. - You can wait until those queries or DDL finish or cancel them. If nothing is blocking the `ALTER TABLE ... SET TIFLASH REPLICA ...` being executed, go to the next step. 2. Check whether TiFlash Region replication runs correctly. - Check whether there is any change in `progress` of `information_schema.tiflash_replica`. Or you can check the `progress` field with keyword `Tiflash replica is not available` in TiDB logging: + Check whether there is any change in `progress` of `information_schema.tiflash_replica`. Alternatively, you can check the `progress` field with the keyword `Tiflash replica is not available` in TiDB logging: - - If changes are detected, it indicates TiFlash replication is functioning normally (though potentially at a slower pace). Please refer to the "Data replication is slow" section for optimization configurations. - - If no, TiFlash replication is abnormal, go to the next step. + - If changes are detected, it indicates TiFlash replication is functioning normally (though potentially at a slower pace). Refer to [Data replication is slow](#data-replication-is-slow) for optimization configurations. + - If no, TiFlash replication is abnormal. Go to the next step. 3. Check whether TiDB has created any placement rule for the table. Search the logs of TiDB DDL Owner and check whether TiDB has notified PD to add placement rules. For non-partitioned tables, search `ConfigureTiFlashPDForTable`. For partitioned tables, search `ConfigureTiFlashPDForPartitions`. - If the keyword is found, go to the next step. - - If not, collect logs of the corresponding component for troubleshooting. + - If not found, collect logs of the corresponding component for troubleshooting. 4. Check whether PD has configured any placement rule for tables. Run the `curl http://:/pd/api/v1/config/rules/group/tiflash` command to view all TiFlash placement rules on the current PD. - - If a rule with the ID being `table--r` is found, the PD has configured a placement rule successfully, go to the next step. - - If not, collect logs of the corresponding component for troubleshooting. + - If a rule with the ID being `table--r` is found, the PD has configured a placement rule successfully. Go to the next step. + - If not found, collect logs of the corresponding component for troubleshooting. 5. Check whether the PD schedules properly. - Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. Or check whether there are `add-rule-peer` operator on the "Operator/Schedule operator create" of PD Dashboard on Grafana. You can also check the value "Scheduler/Patrol Region time" of PD Dashboard on Grafana. "Patrol Region time" reflects the duration for PD to scan all Regions and generate scheduling operations. A high value may cause delays in scheduling. + Search the `pd.log` file for the `table--r` keyword and scheduling behaviors like `add operator`. Alternatively, check whether there are `add-rule-peer` operator on the "Operator/Schedule operator create" of PD Dashboard on Grafana. You can also check the value "Scheduler/Patrol Region time" on the PD Dashboard on Grafana. "Patrol Region time" reflects the duration for PD to scan all Regions and generate scheduling operations. A high value might cause delays in scheduling. - If the keyword is found, the PD schedules properly. - - If no scheduling operations is generated, or the "Patrol Region time" is more than 30 minutes, the PD does not schedule properly or is scheduling slowly. + - If no scheduling operations are running, or the **Patrol Region time** is more than 30 minutes, the PD does not schedule properly or is scheduling slowly. -If the above methods cannot resolve your issue, collect the TiDB, PD, TiFlash log files and [get support](/support.md) from PingCAP or the community. +If the preceding methods cannot resolve your issue, collect the TiDB, PD, TiFlash log files and [get support](/support.md) from PingCAP or the community. ## Data replication is slow -The causes may vary. You can address the problem by performing the following steps. +The causes vary. You can address the problem by performing the following steps. 1. Follow the [Speed up TiFlash replication](/tiflash/create-tiflash-replicas.md#speed-up-tiflash-replication) to accelerate replication. From b1e5ef19115a67aa0356bd60826f36928de5ec6b Mon Sep 17 00:00:00 2001 From: JaySon Date: Tue, 13 May 2025 15:12:29 +0800 Subject: [PATCH 07/11] Apply suggestions from code review Co-authored-by: xixirangrang --- tiflash/create-tiflash-replicas.md | 2 +- tiflash/tiflash-overview.md | 2 +- tiflash/troubleshoot-tiflash.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index 31b31bc71ad26..37f4eeeb8f9e2 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -160,7 +160,7 @@ When TiFlash replicas for a table are added, or the Regions' TiFlash replicas be > tiup ctl:v8.5.0 pd -u http://192.168.1.4:2379 store limit all engine tiflash 60 add-peer > ``` - If a significant number of Regions already exist in the old TiFlash nodes and need rebalancing to the new nodes, adjust the `remove-peer` restriction accordingly. + If the cluster contains many Regions on the old TiFlash nodes, PD need to rebalance them to the new TiFlash nodes. You need to adjust the `remove-peer` limit accordingly. ```shell tiup ctl:v pd -u http://:2379 store limit all engine tiflash 60 remove-peer diff --git a/tiflash/tiflash-overview.md b/tiflash/tiflash-overview.md index 24b0b2342cb6f..5e0d145d7c9e3 100644 --- a/tiflash/tiflash-overview.md +++ b/tiflash/tiflash-overview.md @@ -26,7 +26,7 @@ TiFlash provides the columnar storage, with a layer of coprocessors efficiently TiFlash conducts real-time replication of data in the TiKV nodes at a low cost that does not block writes in TiKV. Meanwhile, it provides the same read consistency as in TiKV and ensures that the latest data is read. The Region replica in TiFlash is logically identical to those in TiKV, and is split and merged along with the Leader replica in TiKV at the same time. -Deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the AVX2 instruction set. Verify this by ensuring `grep avx2 /proc/cpuinfo` produces output. For Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Verify this by ensuring `grep 'crc32' /proc/cpuinfo | grep 'asimd'` produces output. By using the instruction set extensions, TiFlash's vectorization engine can deliver better performance. +Deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the AVX2 instruction set. Verify this by ensuring that `grep avx2 /proc/cpuinfo` produces output. For Linux ARM64 architecture, the CPU must support the ARMv8 instruction set architecture. Verify this by ensuring that `grep 'crc32' /proc/cpuinfo | grep 'asimd'` produces output. By using the instruction set extensions, TiFlash's vectorization engine can deliver better performance. diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index 10fcf1cfd38ab..0da1817ce3f4a 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -140,7 +140,7 @@ In this example, the warning message shows that TiDB does not select the MPP mod ## TiFlash replica is always unavailable -If TiFlash replicas consistently fail to be created since the TiDB cluster is deployed, or if the TiFlash replicas were initially created normally but then all or some tables fails to be created after a period of time, you can diagnose and resolve the issue by performing the following steps: +After the TiDB cluster is deployed, if the TiFlash replicas consistently fail to be created, or if the TiFlash replicas are initially created normally but all or some tables fail to be created after a period of time, you can do the following to troubleshoot the issue: 1. Check whether PD enables the `Placement Rules` feature. This feature is enabled by default since v5.0: From 4ddeacd7b858d310c2aace3b48e2c4866b450d5c Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 14 May 2025 09:17:30 +0800 Subject: [PATCH 08/11] Update tiflash/create-tiflash-replicas.md Co-authored-by: JaySon --- tiflash/create-tiflash-replicas.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index 37f4eeeb8f9e2..01c88a7be9067 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -134,7 +134,12 @@ SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = " -When TiFlash replicas for a table are added, or the Regions' TiFlash replicas being move to another TiFlash instance, the TiKV instance performs a table scan and sends the scanned data to TiFlash as a "snapshot" to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage in order to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps. +The TiDB cluster triggers the TiFlash replica replication process when the following operations are performed: + + * TiFlash replicas for a table are added. + * When a new TiFlash instance is added, PD moves the TiFlash replicas to the new TiFlash instance. + +The TiKV instance performs a table scan and sends the scanned data to TiFlash as a snapshot to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps. 1. Temporarily increase the snapshot write speed limit for each TiKV and TiFlash instance by using the [Dynamic Config SQL statement](https://docs.pingcap.com/tidb/stable/dynamic-config): From ede1f83b3994e3c62b595be2ea2f4cd299db5f0a Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 15 May 2025 10:25:54 +0800 Subject: [PATCH 09/11] rewording --- tiflash/troubleshoot-tiflash.md | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index 0da1817ce3f4a..f16164437c143 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -10,22 +10,18 @@ This section describes some commonly encountered issues when using TiFlash, the ## TiFlash fails to start -The issue might occur due to different reasons. It is recommended that you troubleshoot it following the steps below: +The issue might occur due to different reasons. It is recommended that you troubleshoot it as follows: 1. Check whether your system is RedHat Enterprise Linux 8. RedHat Enterprise Linux 8 does not have the `libnsl.so` system library. You can manually install it via the following command: - {{< copyable "shell-regular" >}} - ```shell dnf install libnsl ``` 2. Check your system's `ulimit` parameter setting. - {{< copyable "shell-regular" >}} - ```shell ulimit -n 1000000 ``` @@ -42,13 +38,13 @@ If the preceding methods cannot resolve your issue, collect the TiFlash log file ## Some queries return the `Region Unavailable` error -If the load pressure on TiFlash is too heavy and it causes that TiFlash data replication falls behind, some queries might return the `Region Unavailable` error. +If the workload on TiFlash is too heavy and it causes that TiFlash data replication falls behind, some queries might return the `Region Unavailable` error. -In this case, you can balance the load pressure by adding more TiFlash nodes. +In this case, you can balance the workload by [adding more TiFlash nodes](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster). ## Data file corruption -Take the following steps to handle the data file corruption: +To handle data file corruption, follow these steps: 1. Refer to [Take a TiFlash node down](/scale-tidb-using-tiup.md#scale-in-a-tiflash-cluster) to take the corresponding TiFlash node down. 2. Delete the related data of the TiFlash node. @@ -56,7 +52,7 @@ Take the following steps to handle the data file corruption: ## Removing TiFlash nodes is slow -Take the following steps to handle this issue: +To address this issue, follow these steps: 1. Check whether any table has more TiFlash replicas than the number of TiFlash nodes available after the cluster scale-in: @@ -117,8 +113,6 @@ Take the following steps to handle this issue: If a statement contains operators or functions not supported in the MPP mode, TiDB does not select the MPP mode. Therefore, the analysis of the statement is slow. In this case, you can execute the `EXPLAIN` statement to check for operators or functions not supported in the MPP mode. -{{< copyable "sql" >}} - ```sql create table t(a datetime); alter table t set tiflash replica 1; @@ -144,8 +138,6 @@ After the TiDB cluster is deployed, if the TiFlash replicas consistently fail to 1. Check whether PD enables the `Placement Rules` feature. This feature is enabled by default since v5.0: - {{< copyable "shell-regular" >}} - ```shell echo 'config show replication' | /path/to/pd-ctl -u http://${pd-ip}:${pd-port} ``` @@ -176,8 +168,6 @@ After the TiDB cluster is deployed, if the TiFlash replicas consistently fail to > > The default value of `count` is `3`. In production environments, the value is usually smaller than the number of TiKV nodes. In test environments, if it is acceptable to have only one Region replica, you can set the value `1`. - {{< copyable "shell-regular" >}} - ```shell curl -X POST -d '{ "group_id": "pd", From b2fdbc146a15df41faf381bf0ba2f5452bab1444 Mon Sep 17 00:00:00 2001 From: JaySon Date: Thu, 22 May 2025 18:36:54 +0800 Subject: [PATCH 10/11] Update tiflash/troubleshoot-tiflash.md Co-authored-by: xixirangrang --- tiflash/troubleshoot-tiflash.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tiflash/troubleshoot-tiflash.md b/tiflash/troubleshoot-tiflash.md index f16164437c143..304205a68604e 100644 --- a/tiflash/troubleshoot-tiflash.md +++ b/tiflash/troubleshoot-tiflash.md @@ -211,8 +211,8 @@ After deploying a TiFlash node and starting replication by executing `ALTER TABL 1. Check whether the replication is successful by running the `ALTER table set tiflash replica ` command and check the output. - If the query is blocked, run the `SELECT * FROM information_schema.tiflash_replica` statement to check whether TiFlash replicas have been created. - - Check whether the DDL statement is executed as expected through [ADMIN SHOW DDL](/sql-statements/sql-statement-admin-show-ddl.md). Or there are any other DDL statement that block altering TiFlash replica statement being executed. - - Check whether any DML statement is executed on the same table through [SHOW PROCESSLIST](/sql-statements/sql-statement-show-processlist.md) that blocks altering TiFlash replica statement being executed. + - Check whether the DDL statement is executed as expected through [`ADMIN SHOW DDL`](/sql-statements/sql-statement-admin-show-ddl.md). Or there are any other DDL statement that block altering TiFlash replica statement being executed. + - Check whether any DML statement is executed on the same table through [`SHOW PROCESSLIST`](/sql-statements/sql-statement-show-processlist.md) that blocks altering TiFlash replica statement being executed. - You can wait until those queries or DDL finish or cancel them. If nothing is blocking the `ALTER TABLE ... SET TIFLASH REPLICA ...` being executed, go to the next step. 2. Check whether TiFlash Region replication runs correctly. From 7d5fa27dd3f3049cc0038318ec51f72ce0c0f650 Mon Sep 17 00:00:00 2001 From: JaySon Date: Thu, 22 May 2025 18:39:10 +0800 Subject: [PATCH 11/11] fix ci --- tiflash/create-tiflash-replicas.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tiflash/create-tiflash-replicas.md b/tiflash/create-tiflash-replicas.md index 01c88a7be9067..aa09668f717bf 100644 --- a/tiflash/create-tiflash-replicas.md +++ b/tiflash/create-tiflash-replicas.md @@ -136,8 +136,8 @@ SELECT TABLE_NAME FROM information_schema.tables where TABLE_SCHEMA = " The TiDB cluster triggers the TiFlash replica replication process when the following operations are performed: - * TiFlash replicas for a table are added. - * When a new TiFlash instance is added, PD moves the TiFlash replicas to the new TiFlash instance. +* TiFlash replicas for a table are added. +* When a new TiFlash instance is added, PD moves the TiFlash replicas to the new TiFlash instance. The TiKV instance performs a table scan and sends the scanned data to TiFlash as a snapshot to create replicas. By default, TiFlash replicas are added slowly with fewer resources usage to minimize the impact on the online service. If there are spare CPU and disk IO resources in your TiKV and TiFlash nodes, you can accelerate TiFlash replication by performing the following steps.