You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tikv-control.md
+20Lines changed: 20 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -463,6 +463,26 @@ success!
463
463
> - The argument of the `-p` option specifies the PD endpoints without the `http` prefix. Specifying the PD endpoints is to query whether the specified `region_id` is validated or not.
464
464
> - You need to run this command for all stores where specified Regions' peers are located.
465
465
466
+
### Recover from ACID inconsistency data
467
+
468
+
To recover data from ACID inconsistency, such as the loss of most replicas or incomplete data synchronization, you can use the `reset-to-version` command. When using this command, you need to provide an old version number that can promise the ACID consistency. Then, `tikv-ctl` cleans up all data after the specified version.
469
+
470
+
- The `-v` option is used to specify the version number to restore. To get the value of the `-v` parameter, you can use the `pd-ctl min-resolved-ts` command.
> - The preceding command only supports the online mode. Before executing the command, you need to stop processes that will write data to TiKV, such as TiDB. After the command is executed successfully, it will return`success!`.
483
+
> - You need to execute the same commandforall TiKV nodesin the cluster.
484
+
> - All PD scheduling tasks should be stopped before executing the command.
485
+
466
486
### Ldb Command
467
487
468
488
The `ldb`command line tool offers multiple data access and database administration commands. Some examples are listed below. For more information, refer to the help message displayed when running `tikv-ctl ldb` or check the documents from RocksDB.
Copy file name to clipboardExpand all lines: two-data-centers-in-one-city-deployment.md
+24-7Lines changed: 24 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -212,7 +212,6 @@ The replication mode is controlled by PD. You can configure the replication mode
212
212
primary-replicas = 2
213
213
dr-replicas = 1
214
214
wait-store-timeout = "1m"
215
-
wait-sync-timeout = "1m"
216
215
```
217
216
218
217
- Method 2: If you have deployed a cluster, use pd-ctl commands to modify the configurations of PD.
@@ -274,14 +273,32 @@ The details for the status switch are as follows:
274
273
275
274
### Disaster recovery
276
275
277
-
This section introduces the disaster recovery solution of the two data centers in one city deployment.
276
+
This section introduces the disaster recovery solution of the two data centers in one city deployment. The disaster discussed in this section is the overall failure of the primary data center, or multiple TiKV nodes in the primary/secondary data center fail, resulting in the loss of most replicas and it is unable to provide services.
278
277
279
-
When a disaster occurs to a cluster in the synchronous replication mode, you can perform data recovery with `RPO = 0`:
278
+
#### Overall failure of the primary data center
280
279
281
-
- If the primary data center fails and most of the Voter replicas are lost, but complete data exists in the DR data center, the lost data can be recovered from the DR data center. At this time, manual intervention is required with professional tools. You can contact the TiDB team for a recovery solution.
280
+
In this situation, all Regions in the primary data center have lost most of their replicas, so the cluster is unable to use. At this time, it is necessary to use the secondary data center to recover the service. The replication status before failure determines the recovery ability:
282
281
283
-
- If the DR center fails and a few Voter replicas are lost, the cluster automatically switches to the asynchronous replication mode.
282
+
- If the status before failure is in the synchronous replication mode (the status code is `sync` or `async_wait`), you can use the secondary data center to recover using `RPO = 0`.
284
283
285
-
When a disaster occurs to a cluster that is not in the synchronous replication mode and you cannot perform data recovery with `RPO = 0`:
284
+
- If the status before failure is in the asynchronous replication mode (the status code is `async`), the written data in the primary data center in the asynchronous replication mode is lost after using the secondary data center to recover. A typical scenario is that the primary data center disconnects from the secondary data center and the primary data center switches to the asynchronous replication mode and provides service for a while before the overall failure.
286
285
287
-
- If most of the Voter replicas are lost, manual intervention is required with professional tools. You can contact the TiDB team for a recovery solution.
286
+
- If the status before failure is switching from the asynchronous to synchronous (the status code is `sync-recover`), part of the written data in the primary data center in the asynchronous replication mode is lost after using the secondary data center to recover. This might cause the ACID inconsistency, and you need to recover it additionally. A typical scenario is that the primary data center disconnects from the secondary data center, the connection is restored after switching to the asynchronous mode, and data is written. But during the data synchronization between primary and secondary, something goes wrong and causes the overall failure of the primary data center.
287
+
288
+
The process of disaster recovery is as follows:
289
+
290
+
1. Stop all PD, TiKV, and TiDB services of the secondary data center.
291
+
292
+
2. Start PD nodes of the secondary data center using the single replica mode with the [`--force-new-cluster`](/command-line-flags-for-pd-configuration.md#--force-new-cluster) flag.
293
+
294
+
3. Use the [Online Unsafe Recovery](/online-unsafe-recovery.md) to process the TiKV data in the secondary data center and the parameters are the list of all Store IDs in the primary data center.
295
+
296
+
4. Write a new configuration of placement rule using [PD Control](/pd-control.md), and the Voter replica configuration of the Region is the same as the original cluster in the secondary data center.
297
+
298
+
5. Start the PD and TiKV services of the primary data center.
299
+
300
+
6. To recover ACID consistency (the status of `DR_STATE` in the old PD is `sync-recover`), you can use [`reset-to-version`](/tikv-control.md#recover-from-acid-inconsistency-data) to process TiKV data and the `version` parameter used can be obtained from `pd-ctl min-resolved-ts`.
301
+
302
+
7. Start the TiDB service in the primary data center and check the data integrity and consistency.
303
+
304
+
If you need support for disaster recovery, you can contact the TiDB team for a recovery solution.
0 commit comments