Stores Metaflow state, acting as Metaflow's remote Datastore. The data stored includes but is not limited:
- for each flow
- for each version
- conda environments
- dependencies
- artifacts
- input
- output
- for each version
No duplicate data is stored thanks to automatic deduplication built into Metaflow.
To read more, see the Metaflow docs
Name | Description | Type | Default | Required |
---|---|---|---|---|
db_engine | n/a | string |
"postgres" |
no |
db_engine_version | n/a | string |
"11" |
no |
db_instance_type | RDS instance type to launch for PostgresQL database. | string |
"db.t2.small" |
no |
db_name | Name of PostgresQL database for Metaflow service. | string |
"metaflow" |
no |
db_username | PostgresQL username; defaults to 'metaflow' | string |
"metaflow" |
no |
force_destroy_s3_bucket | Empty S3 bucket before destroying via terraform destroy | bool |
false |
no |
metadata_service_security_group_id | The security group ID used by the MetaData service. We'll grant this access to our DB. | string |
n/a | yes |
metaflow_vpc_id | ID of the Metaflow VPC this SageMaker notebook instance is to be deployed in | string |
n/a | yes |
resource_prefix | Prefix given to all AWS resources to differentiate between applications | string |
n/a | yes |
resource_suffix | Suffix given to all AWS resources to differentiate between environment and workspace | string |
n/a | yes |
standard_tags | The standard tags to apply to every AWS resource. | map(string) |
n/a | yes |
subnet1_id | First subnet used for availability zone redundancy | string |
n/a | yes |
subnet2_id | Second subnet used for availability zone redundancy | string |
n/a | yes |
Name | Description |
---|---|
METAFLOW_DATASTORE_SYSROOT_S3 | Amazon S3 URL for Metaflow DataStore |
METAFLOW_DATATOOLS_S3ROOT | Amazon S3 URL for Metaflow DataTools |
database_name | The database name |
database_password | The database password |
database_username | The database username |
datastore_s3_bucket_kms_key_arn | The ARN of the KMS key used to encrypt the Metaflow datastore S3 bucket |
rds_master_instance_endpoint | The database connection endpoint in address:port format |
s3_bucket_arn | The ARN of the bucket we'll be using as blob storage |
s3_bucket_name | The name of the bucket we'll be using as blob storage |