-
Notifications
You must be signed in to change notification settings - Fork 327
HAWQ 1078. Implement hawqsync-falcon DR utility. #940
base: master
Are you sure you want to change the base?
Conversation
@vVineet How can we get this prioritized for the next release? Also, anyone that can put eyes on it for a code review would be helpful. |
@kdunn-pivotal : I propose a discussion including @ictmalili as this ties with HAWQ Register feature. I'd love to see the contribution make it in HAWQ. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a lot of work here. Could you describe a step-by-step instruction that users can do DR? Thanks a lot!
retVal, stderr = startHawq(masterHost=options.targetHawqMaster, | ||
isTesting=options.testMode) | ||
print retVal, stderr if options.verbose else None; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdunn926 , may I understand till here all the files are copied to new HAWQ master data directory, but HAWQ catalog information has not been changed. We can leverage hawq register to register the files into HAWQ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ictmalili - At this point in the process several things have happened on the source cluster:
- the source HAWQ Master Data Directory (MDD) has been tarballed
- the tarball has been copied to HDFS in the
/hawq_default
directory on the source - The
/hawq_default
directory has been recursively copied to the DR cluster via Apache Falcon (distcp) - A checksum generated by a recursive listing of files+sizes has been generated for each cluster (source and DR) and successfully validated
Basically, by the time this point in the code is reached, the source HAWQ system (data & metadata) is completely archived & verified in the DR site.
print """ | ||
## Manual runbook during DR event | ||
1. Copy MDD archive from HDFS to target master (CLI) | ||
2. Restore archive in /data/hawq/ (CLI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which specific directory does this map to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it HAWQ master's MASTER_DATA_DIRECTORY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the MDD archive is the MASTER_DATA_DIRECTORY
from the source HAWQ cluster.
HAWQSYNC partial-sync recovery runbook:
This process, overall, ensures our catalog EOF marker and actual HDFS file size are properly aligned for every table. This is especially important when ETL needs to resume on tables that may have previously had "inconsistent bytes" appended, as would be the case for a partial sync. |
This is the initial commit for a Python utility to orchestrate a DR syncronization for HAWQ, based on Falcon HDFS replication and a cold backup of the active HAWQ master's MASTER_DATA_DIRECTORY.
A code review would be greatly appreciated, when someone has cycles. Active testing is currently underway in a production deployment.