Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store AWS region in AwsStorageConfigurationInfo #455

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

eric-maynard
Copy link
Contributor

Description

This adds support for a new property, region for AWS storage configurations.

Fixes #342

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

I'm able to create catalogs and add the region property to the StorageConfigInfo:

LD4RTJ0HY9:polaris emaynard$ curl -X POST http://localhost:8181/api/management/v1/catalogs \
> -H "Authorization: Bearer principal:root;realm:default-realm" \
> -H "Content-Type: application/json" \
> -d '{
>   "catalog": {
>     "type": "INTERNAL",
>     "name": "example_catalog",
>     "properties": {
>       "default-base-location": "s3://your-bucket/catalog-location/"
>     },
>     "storageConfigInfo": {
>       "storageType": "S3",
>       "roleArn": "arn:aws:iam::012345678901:role/jdoe",
>       "region": "us-east-2"
>     }
>   }
> }'
LD4RTJ0HY9:polaris emaynard$ curl -X GET http://localhost:8181/api/management/v1/catalogs/example_catalog \
> -H "Authorization: Bearer principal:root;realm:default-realm" \
> -H "Content-Type: application/json" | jq
{
  "type": "INTERNAL",
  "name": "example_catalog",
  "properties": {
    "default-base-location": "s3://your-bucket/catalog-location/"
  },
  "createTimestamp": 1731818113312,
  "lastUpdateTimestamp": 1731818113312,
  "entityVersion": 1,
  "storageConfigInfo": {
    "storageType": "S3",
    "roleArn": "arn:aws:iam::012345678901:role/jdoe",
    "externalId": null,
    "userArn": null,
    "region": "us-east-2",
    "allowedLocations": [
      "s3://your-bucket/catalog-location/"
    ]
  }
}

@eric-maynard eric-maynard changed the title Issue 342 Storage AWS region in AwsStorageConfigurationInfo Nov 17, 2024
@eric-maynard eric-maynard changed the title Storage AWS region in AwsStorageConfigurationInfo Store AWS region in AwsStorageConfigurationInfo Nov 17, 2024
@eric-maynard eric-maynard marked this pull request as draft November 18, 2024 17:51
Copy link

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doubt] This region specified will be region of the polaris catalog ? Do we plan to support federation of catalogs in future if yes, the should this be the region of polaris catalog or the region of federated catalog ?

for ex:
can this a possible case in coming future ?
Polaris -> (Federated to) -> Glue
(region A) ------ > (region B)

@eric-maynard
Copy link
Contributor Author

eric-maynard commented Nov 18, 2024

Hey @singhpk234, the idea is that you can associate a region with a storage configuration so that the region can be used by any client that leverages credentials/files associated that storage configuration.

As you pointed out, it is not clear how this will work with catalog federation (cc @dennishuo). But I think it is also unclear how storage configurations will work with federated catalogs more generally -- for example, a single role ARN may not be valid for the entire federated catalog. So this is something our design for federation must address.

At the very least, we have discussed allowing storage configurations to be defined on a level more granular than the catalog (e.g. at the table or namespace level).

Maybe @munendrasn, the filer of #342, can also help provide some additional context here. For my part I am curious if there's a particular test case we can add here to make sure the issue reported in #342 is fully addressed.

@munendrasn
Copy link

@eric-maynard
It is similar to the case @singhpk234 mentioned.
We have custom catalog tracking Native iceberg tables and Federated Iceberg tables from different Catalogs. One such Catalog is Polaris Catalog.
Our setup is in one AWS region but the Federated table's data is stored in another region.. So, accessing the table fails.

On the testing, are you looking to test it via Iceberg APIs or directly S3 client API? If AWS_REGION is set to one region but the table's storage in another region.. any read or listOperation would fail unless region is explicitly specified on s3Client creation

@eric-maynard
Copy link
Contributor Author

Hi @munendrasn I see -- do the current changes here work for your use case then? client.region should be specified in the credentials map so long as it's set for the table's storage configuration.

@eric-maynard eric-maynard marked this pull request as ready for review November 20, 2024 18:42
@munendrasn
Copy link

@eric-maynard
As long as client.region is set, it would the reported issue. Skimming through the code, I couldn't find client.region being returned as part of credentialMap. would it be possible to include a test case for the same?

@eric-maynard
Copy link
Contributor Author

eric-maynard commented Nov 25, 2024

Hi @munendrasn, please see the newly-added test. client.region should be present in the credentials map when a region is attached to the storage config.

Copy link

@munendrasn munendrasn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@JsonProperty(value = "roleARN", required = true) @NotNull String roleARN) {
this(storageType, allowedLocations, roleARN, null);
@JsonProperty(value = "roleARN", required = true) @NotNull String roleARN,
@JsonProperty(value = "region", required = false) @NotNull String region) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why @NotNull here while the property is nullable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE REQUEST] support for returning client region in the loadTable response for S3
4 participants