forked from delta-io/delta
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add terraform deploying infrastructure for benchmarks
## Description Currently, in order to run performance benchmarks one need to create the infrastructure manually. This PR adds Terraform scripts which do that automatically for AWS and GCP. I tested this patch manually on AWS and GCP cloud. ## Does this PR introduce _any_ user-facing changes? No. Closes delta-io#1179 Co-authored-by: Grzegorz Koakowski <[email protected]> Signed-off-by: Scott Sandre <[email protected]> GitOrigin-RevId: 9cb7769afc7889beb743f499f271d8eac1167c1f
- Loading branch information
1 parent
ff6914b
commit 8158663
Showing
30 changed files
with
926 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
22 changes: 22 additions & 0 deletions
22
benchmarks/infrastructure/aws/terraform/.terraform.lock.hcl
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# Create infrastructure with Terraform | ||
|
||
1. Install [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli?in=terraform/aws-get-started). | ||
2. Create an IAM user which will be used to create benchmarks infrastructure. Ensure that your AWS CLI is configured. | ||
You should either have valid credentials in shared credentials file (e.g. `~/.aws/credentials`) | ||
``` | ||
[default] | ||
aws_access_key_id = anaccesskey | ||
aws_secret_access_key = asecretkey | ||
``` | ||
or export keys as environment variables: | ||
```bash | ||
export AWS_ACCESS_KEY_ID="anaccesskey" | ||
export AWS_SECRET_ACCESS_KEY="asecretkey" | ||
``` | ||
3. Add permissions for the IAM user. You can either assign `AdministratorAccess` AWS managed policy (discouraged) | ||
or assign AWS managed policies in a more granular way: | ||
* `IAMFullAccess` | ||
* `AmazonVPCFullAccess` | ||
* `AmazonEMRFullAccessPolicy_v2` | ||
* `AmazonElasticMapReduceFullAccess` | ||
* `AmazonRDSFullAccess` | ||
* `AmazonS3FullAccess` | ||
* a custom policy for EC2 key pairs management | ||
```json | ||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Effect": "Allow", | ||
"Action": [ | ||
"ec2:ImportKeyPair", | ||
"ec2:CreateKeyPair", | ||
"ec2:DeleteKeyPair" | ||
], | ||
"Resource": "arn:aws:ec2:*:*:key-pair/benchmarks_key_pair" | ||
} | ||
] | ||
} | ||
``` | ||
|
||
4. Create Terraform variable file `benchmarks/infrastructure/aws/terraform/terraform.tfvars` and fill in variable values. | ||
```tf | ||
region = "<REGION>" | ||
availability_zone1 = "<AVAILABILITY_ZONE1>" | ||
availability_zone2 = "<AVAILABILITY_ZONE2>" | ||
benchmarks_bucket_name = "<BUCKET_NAME>" | ||
source_bucket_name = "<SOURCE_BUCKET_NAME>" | ||
mysql_user = "<MYSQL_USER>" | ||
mysql_password = "<MYSQL_PASSWORD>" | ||
emr_public_key_path = "<EMR_PUBLIC_KEY_PATH>" | ||
user_ip_address = "<MY_IP>" | ||
emr_workers = WORKERS_COUNT | ||
tags = { | ||
key1 = "value1" | ||
key2 = "value2" | ||
} | ||
``` | ||
Please check `variables.tf` to learn more about each parameter. | ||
|
||
5. Run: | ||
```bash | ||
terraform init | ||
terraform validate | ||
terraform apply | ||
``` | ||
As a result, a new VPC, a S3 bucket, a MySQL instance (metastore) and a EMR cluster will be created. | ||
The `apply` command returns `master_node_address` that will be used when running benchmarks. | ||
``` | ||
Apply complete! Resources: 16 added, 0 changed, 0 destroyed. | ||
Outputs: | ||
master_node_address = "35.165.163.250" | ||
``` | ||
|
||
6. Once the benchmarks are finished, destroy the resources. | ||
```bash | ||
terraform destroy | ||
``` | ||
If the S3 bucket contains any objects, it will not be destroyed automatically. | ||
One need to do that manually to avoid any accidental data loss. | ||
``` | ||
Error: deleting S3 Bucket (my-bucket): BucketNotEmpty: The bucket you tried to delete is not empty | ||
status code: 409, request id: Q11TYZ5E0B23QGQ2, host id: WdeFY88km5IBhy+bi2hqXzgjBxjrn1+OPtCstsWDjkwGNCyEhXYjq330DZq1jbfNXojBEejH6Wg= | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
module "networking" { | ||
source = "./modules/networking" | ||
|
||
availability_zone1 = var.availability_zone1 | ||
availability_zone2 = var.availability_zone2 | ||
} | ||
|
||
module "storage" { | ||
source = "./modules/storage" | ||
|
||
benchmarks_bucket_name = var.benchmarks_bucket_name | ||
} | ||
|
||
module "processing" { | ||
source = "./modules/processing" | ||
|
||
vpc_id = module.networking.vpc_id | ||
subnet1_id = module.networking.subnet1_id | ||
subnet2_id = module.networking.subnet2_id | ||
|
||
availability_zone1 = var.availability_zone1 | ||
benchmarks_bucket_name = var.benchmarks_bucket_name | ||
source_bucket_name = var.source_bucket_name | ||
mysql_user = var.mysql_user | ||
mysql_password = var.mysql_password | ||
emr_public_key_path = var.emr_public_key_path | ||
emr_workers = var.emr_workers | ||
user_ip_address = var.user_ip_address | ||
|
||
depends_on = [module.networking, module.storage] | ||
} |
32 changes: 32 additions & 0 deletions
32
benchmarks/infrastructure/aws/terraform/modules/networking/main.tf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
resource "aws_vpc" "this" { | ||
cidr_block = "10.0.0.0/16" | ||
} | ||
|
||
resource "aws_subnet" "benchmarks_subnet1" { | ||
vpc_id = aws_vpc.this.id | ||
availability_zone = var.availability_zone1 | ||
cidr_block = "10.0.0.0/17" | ||
} | ||
|
||
# There are two subnets needed to create an RDS subnet group. In fact this one is unused. | ||
# If DB subnet group is built using only one AZ, the following error is thrown: | ||
# The DB subnet group doesn't meet Availability Zone (AZ) coverage requirement. | ||
# Current AZ coverage: us-west-2a. Add subnets to cover at least 2 AZs. | ||
resource "aws_subnet" "benchmarks_subnet2" { | ||
vpc_id = aws_vpc.this.id | ||
availability_zone = var.availability_zone2 | ||
cidr_block = "10.0.128.0/17" | ||
} | ||
|
||
resource "aws_internet_gateway" "this" { | ||
vpc_id = aws_vpc.this.id | ||
} | ||
|
||
resource "aws_default_route_table" "public" { | ||
default_route_table_id = aws_vpc.this.default_route_table_id | ||
|
||
route { | ||
cidr_block = "0.0.0.0/0" | ||
gateway_id = aws_internet_gateway.this.id | ||
} | ||
} |
11 changes: 11 additions & 0 deletions
11
benchmarks/infrastructure/aws/terraform/modules/networking/outputs.tf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
output "vpc_id" { | ||
value = aws_vpc.this.id | ||
} | ||
|
||
output "subnet1_id" { | ||
value = aws_subnet.benchmarks_subnet1.id | ||
} | ||
|
||
output "subnet2_id" { | ||
value = aws_subnet.benchmarks_subnet2.id | ||
} |
7 changes: 7 additions & 0 deletions
7
benchmarks/infrastructure/aws/terraform/modules/networking/variables.tf
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
variable "availability_zone1" { | ||
type = string | ||
} | ||
|
||
variable "availability_zone2" { | ||
type = string | ||
} |
Oops, something went wrong.