Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix matching file prefixes for FILE based catalogs #505

Merged
merged 5 commits into from
Dec 5, 2024

Conversation

adithyashanker22
Copy link
Contributor

@adithyashanker22 adithyashanker22 commented Dec 3, 2024

Ensure matching file prefixes when using polaris instance with FILE based storage.

Description

We encountered a bug when attempting to register tables to a file-based instance of Polaris for testing. Through logging, we discovered an inconsistent file naming convention when comparing file:/// to /. This inconsistency caused the system to fail to recognize paths that were actually children as child paths, resulting in the isChildOf method failing every time.

This change ensures that when comparing file paths in the isChildOf method, we use a consistent prefix. Additionally, this change enhances the robustness of the method by ensuring that comparisons are made between paths with similar prefixes, whether they start with / or file:///.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • This test has been manually verified by registering tables and creating namespaces using the new method. The method ensures that namespaces and tables are only created and registered if they are part of the child directory, while also ensuring that comparisons are made between similar prefixes.

Test Configuration:

  • Hardware: Macbook M3 Pro
  • Toolchain:
  • SDK: Java 21

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • My changes generate no new warnings

Ensure matching File prefixes when using polaris instance with FILE based storage.
@adithyashanker22 adithyashanker22 changed the title Update StorageLocation.java Fix matching file prefixes for FILE based catalogs Dec 3, 2024
Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adithyashanker22, thanks for the fix! Left a comment.

@@ -93,6 +93,9 @@ public boolean isChildOf(StorageLocation potentialParent) {
} else {
String slashTerminatedLocation = ensureTrailingSlash(this.location);
String slashTerminatedParentLocation = ensureTrailingSlash(potentialParent.location);
if (slashTerminatedLocation.startsWith("/") && slashTerminatedParentLocation.startsWith("file:///")) {
slashTerminatedLocation = "file://" + slashTerminatedLocation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend standardizing the URI schema to file:/// instead of file://. The file:/// format is the correct URI schema, whereas file://path is not. To make this consistent and maintainable, we could define a constant like this:

public static final String FILE_SCHEME = "file:///";

Additionally, could we add a test to ensure this change is verified and behaves as expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it should definitely be file:/// not file://.

However the :/// is not part of the scheme so that specific constant might be confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. We could give a different constant name, like FILE_SCHEMA_PATH_PRFIX, or LOCAL_PATH_PREFIX.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments updated for prefix file:/// and testing

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me, though I'm not sure how meaningful the isChildOf check really is for a local filesystem where there is no prefix-based credential vending.

Can you add a test?

@flyrain
Copy link
Contributor

flyrain commented Dec 4, 2024

Side note: Should we standardize the location string when creating a catalog? For instance, Polaris converts user input like /path into the standardized format file:///path. This ensures consistency and aligns with proper URI conventions. We could do this as a follow-up if needed.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left minor comments about the unit test.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 actually I think the current unit tests are good. Thanks for the fix!

@flyrain flyrain merged commit b5685ef into apache:main Dec 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants