The Data Transfer from Amazon S3 Glacier vaults to Amazon S3 is a serverless solution that automatically copies entire Amazon S3 Glacier vault archives to a defined destination Amazon Simple Storage Service (Amazon S3 bucket) and S3 storage class.
The solution automates the optimized restore, copy, and transfer process and provides a prebuilt Amazon CloudWatch dashboard to visualize the copy operation progress. Deploying this solution allows you to seamlessly copy your S3 Glacier vault archives to more cost effective storage locations such as the Amazon S3 Glacier Deep Archive storage class.
Copying your Amazon S3 Glacier vault contents to the S3 Glacier Deep Archive storage class combines the low cost and high durability benefits of S3 Glacier Deep Archive, with the familiar Amazon S3 user and application experience that offers simple visibility and access to data. Once your archives are stored as objects in your Amazon S3 bucket, you can add tags to your data to enable items such as attributing data costs on a granular level.
Note: The solution only copies archives from a source S3 Glacier vault to the destination S3 bucket, it does not delete archives in the source S3 Glacier vault. After the solution completes a successful archive copy to the destination S3 bucket, you must manually delete the archives from your S3 Glacier vault.For more information, refer to Deleting an Archive in Amazon S3 Glacier in the Amazon S3 Glacier Developer Guide.
- Architecture
- Deploying the solution
- Automated testing pipeline
- Project structure
- CDK documentation
- Anonymous metric collection
- Customers invoke a transfer workflow by using an AWS Systems Manager document (SSM document).
- The SSM document starts an AWS Step Functions Orchestrator execution.
- The Step Functions Orchestrator execution initiates a nested Step Functions Get Inventory workflow to retrieve the inventory file.
- Upon completion of the inventory retrieval, the solution invokes the Initiate Retrieval nested Step Functions workflow.
- When a job is ready, Amazon S3 Glacier sends a notification to an Amazon SNS topic indicating job completion.
- The solution stores all job completion notifications in the Amazon SQS Notifications queue.
- When an archive job is ready, Amazon SQS Notifications queue invokes the AWS LambdaNotifications Processor function. This Lambda function prepares the initial steps for archive retrieval.
- The Lambda Notifications Processor function places chunks retrieval messages in Amazon SQS Chunks Retrieval queue for chunk processing.
- The Amazon SQS Chunks Retrieval queue invokes the Lambda Chunk Retrieval function to process each chunk.
- The Lambda Chunk Retrieval function downloads the chunk from Amazon S3 Glacier.
- The Lambda Chunk Retrieval function uploads a multipart upload part to Amazon S3.
- After a new chunk is downloaded, the solution stores chunk metadata Amazon DynamoDB (etag, checksum_sha_256, tree_checksum)
- The Lambda Chunk Retrieval function verifies whether all chunks for that archive have been processed. If yes, it inserts an event into the Amazon SQS Validation queue to invoke the Lambda Validate function.
- The Lambda Validate function performs an integrity check and then closes the Amazon S3 multipart upload.
- A DynamoDB stream invokes the Lambda Metrics Processor to update the transfer process metrics in DynamoDB.
- The Step Functions Orchestrator execution enters an async wait, pausing until the archive retrieval workflow concludes before initiating the Step Functions Cleanup workflow.
- The DynamoDB stream invokes the Lambda Async Facilitator function, which unlocks asynchronous waits in Step Functions.
- The Amazon EventBridge rules periodically initiate Step Functions Extend Download Window and Update CloudWatch Dashboard workflows.
- Customers monitor the transfer progress by using the Amazon CloudWatch dashboard.
Refer to the solution landing page to deploy the solution using our pre-packaged deployment assets.
The solution can be deployed to your AWS account directly from the source code using AWS Cloud Development Kit (CDK).
Install prerequisite software packages:
note: following instructions tested with nodejs v20.10.0 and python 3.11.6
git clone https://github.com/aws-solutions/data-transfer-from-amazon-s3-glacier-vaults-to-amazon-s3
pyenv virtualenv 3.11.0 grf-venv
pyenv activate grf-venv
pip install ".[dev]"
Make sure AWS CLI is operational (see here)
aws s3 ls
Bootstrap CDK, if required
npx cdk bootstrap
Deploy the solution
npx cdk deploy solution -c skip_integration_tests=true
note: set context parameter skip_integration_tests=false
to indicate if you want to run integration tests against the solution stack
npx cdk deploy mock-glacier
export MOCK_SNS_STACK_NAME=mock-glacier # use mock-glacier stack name
export $STACK_NAME=solution # use solution stack name
tox -e integration
The Data Transfer from S3 Glacier vaults to S3 includes an optional automated testing pipeline that can be deployed to automatically test any changes you develop for the solution on your own development fork. Once setup, this pipeline will automatically download, build, and test any changes that you push to a specified branch on your development fork.
The pipeline can be configured to automatically watch and pull from repos hosted on AWS CodeCommit
- Creating and connecting to codecommit repository
- Push this source code to the codecommit repository
- Create the pipeline
npx cdk deploy pipeline -c repository_name=my-repo -c branch=dev
The pipeline will be triggered any time you make a push to the codecommit repository on the identified branch.
note: due to a known issue where resource name gets truncated, we recommend branch name no longer than 3 characters, while the fix is being worked on.
├── source
│ ├── solution [Source code]
│ │ ├── application [Lambda microservices code]
│ │ ├── infrastructure [CDK code to provision infrastructure related cdk]
│ │ ├── mocking [CDK code to create mock glacier stack]
│ │ ├── pipeline [CDK code to deploy developer friendly pipeline]
│ └── tests [Unit and integration tests]
├── tox.ini [Tox configuration file]
├── pyproject.toml [Project configuration file]
Data Transfer from Amazon S3 Glacier vaults to Amazon S3 templates are generated using AWS CDK, for further information on CDK please refer to the documentation.
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the implementation guide.
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at
http://www.apache.org/licenses/
or in the "LICENSE" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and limitations under the License.