-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19057. S3A: Landsat bucket used in tests no longer accessible #6515
HADOOP-19057. S3A: Landsat bucket used in tests no longer accessible #6515
Conversation
💔 -1 overall
This message was automatically generated. |
although we can and should fix bucket-info to handle spaces, that doesn't work for old releases. so we need a new data URL without spaces there. managed to find an NPE too!
|
going to use s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
source: https://registry.opendata.aws/noaa-ncn/ this is us-east-1 region, fwiw |
with this we are back to the known failures)
Once I've got through the check i'm going to make sure that older builds can work with the new url; if it is good then we can just document how to patch your auth-keys |
plus
|
this is because the file < 16 MB and the prefetch test requires that. Not worried about that, which we can fix by making prefetch block size smaller in the test |
💔 -1 overall
This message was automatically generated. |
new settings can be added to anyone's auth-keys.xml file to get almost all tests to work. Test <property>
<name>fs.s3a.scale.test.csvfile</name>
<value>s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz</value>
<description>file used in scale tests</description>
</property>
<property>
<name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name>
<value>us-east-1</value>
</property>
<property>
<name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name>
<value>false</value>
<description>Don't try to purge uploads in the read-only bucket, as
it will only create log noise.</description>
</property>
<property>
<name>fs.s3a.bucket.noaa-isd-pds.probe</name>
<value>0</value>
<description>Let's postpone existence checks to the first IO operation </description>
</property>
<property>
<name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name>
<value>false</value>
<description>Do not add the referrer header</description>
</property>
<property>
<name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name>
<value>128k</value>
<description>Use a small prefetch size so tests fetch multiple blocks</description>
</property>
|
Using the xml settings above against branch 3.4
These are just going to have to be expectable -some are from hard coded s3a://landsat-pds/ refs; others from file length. for now though: we have a fix for trunk/3.4 and workaround with some failures for the test |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
5c26636
to
018e034
Compare
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
file s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz is longer; should stop eof regressions on the older branches. |
Moves to new test file/bucket Adopts test path s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz this is actually quite an interesting path as it has a space in and breaks s3guard tool uri parsing. fix: those tests just take the root schema/host and not the rest Rename all methods about ExternalFile rather than CSV file, as we no longer expect it to be CSV. Leaves the test key name alone: fs.s3a.scale.test.csvfile This is a .gz file (needed for coded testing) on a store with anonymous access supported. All references to "landsat" in the code and docs have been stripped. * "external file" used instead of "csv file" * "external bucket" used instead of "landsat bucket" * All examples updated. * Unit tests which used it as an arbitrary s3 bucket now use the constant UNIT_TEST_EXAMPLE_PATH = "s3a://example/data/" * references inc variable names where it was a "csv file" now say "external file" ITestS3APrefetchingCacheFiles fixes: * don't remove bucket overrides * use a smaller block size * use an isolated buffer dir * make teardown resilient to startup failures. This stuff isn't going to be backportable to older releases with ITestS3APrefetchingCacheFiles; we will just have to expect failures there as the new test file is too small for the seek logic. Change-Id: Ifcdfa20d753b0ab2b35577291bed1db8aea41f54
Change-Id: I5b5e9cf41657288941865eeb5ee64d029207e54d
* use file s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz which is large enough for existing tests to work. * move new path definition and helper methods to PublicDatasetTestUtils * improve error reporting in ITestS3AInputStreamPerformance if the file is too short * remove javadoc changes from CompressionCodecFactory to isolate build Change-Id: I572e86f3d9b46179a02bc19e87626d92629cdb8c
35edd7c
to
a82c9ff
Compare
🎊 +1 overall
This message was automatically generated. |
need urgent reviews/tests of this from anyone who can, just to fix the widespread test failures @ahmarsuhail @mukund-thakur @HarshitGupta11 @virajjasani @sunchao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Just a minor nit.
What's the status after these changes?
Is it just two failing now?
[ERROR] ITestS3ACommitterFactory.testEverything:112->testImplicitFileBinding:127->assertFactoryCreatesExpectedCommitter:187->Assert.assertEquals:120->Assert.failNotEquals:835->Assert.fail:89 Wrong Committer from factory expected:<class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter> but was:<class org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter>
[ERROR] Errors:
[ERROR] ITestS3AConfiguration.testS3SpecificSignerOverride:577 » SdkClient Unable to l...```
@@ -289,9 +289,8 @@ for buckets in the central and EU/Ireland endpoints. | |||
|
|||
```xml | |||
<property> | |||
<name>fs.s3a.bucket.landsat-pds.endpoint.region</name> | |||
<name>fs.s3a.bucket.us2w-dataset.endpoint.region</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo, usw2-dataset (or let's just be clearer with us-west-2-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
Shall we not use requester pay public bucket for all landsat usages? |
@ahmarsuhail @steveloughran are you aware of any criteria used by Amazon to recycle or restrict access to public buckets? |
Reran the tests by pull the latest code and it went fine. |
Applied the patch locally and ran the test suite, looks good. |
@virajjasani if this goes away we can deal with it again...there's more resilience this time. |
@ahmarsuhail yes, those two failures are outstanding and covered in separate JIRAs.
|
Change-Id: I7a01f4a4dcb2c6dbb50412bb489e1fc70edbf108
@virajjasani i want something with anonymous access; this is the only place we can validate this. |
💔 -1 overall
This message was automatically generated. |
…6515) The AWS landsat data previously used in some S3A tests is no longer accessible This PR moves to the new external file s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz * Large enough file for scale tests * Bucket supports anonymous access * Ends in .gz to keep codec tests happy * No spaces in path to keep bucket-info happy Test Code Changes * Leaves the test key name alone: fs.s3a.scale.test.csvfile * Rename all methods and fields move remove "csv" from their names and move to "external file" we no longer require it to be CSV. * Path definition and helper methods have been moved to PublicDatasetTestUtils * Improve error reporting in ITestS3AInputStreamPerformance if the file is too short With S3 Select removed, there is no need for the file to be a CSV file; there is a test which tries to unzip it; other tests have a minimum file size. Consult the JIRA for the settings to add to auth-keys.xml to switch earlier builds to this same file. Contributed by Steve Loughran
…pache#6515) The AWS landsat data previously used in some S3A tests is no longer accessible This PR moves to the new external file s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz * Large enough file for scale tests * Bucket supports anonymous access * Ends in .gz to keep codec tests happy * No spaces in path to keep bucket-info happy Test Code Changes * Leaves the test key name alone: fs.s3a.scale.test.csvfile * Rename all methods and fields move remove "csv" from their names and move to "external file" we no longer require it to be CSV. * Path definition and helper methods have been moved to PublicDatasetTestUtils * Improve error reporting in ITestS3AInputStreamPerformance if the file is too short This is the V1 SDK version of the patch; it has deleted ITestAWSStatisticCollection as part of the changes. With S3 Select removed, there is no need for the file to be a CSV file; there is a test which tries to unzip it; other tests have a minimum file size. Consult the JIRA for the settings to add to auth-keys.xml to switch earlier builds to this same file. Contributed by Steve Loughran Change-Id: I0a2222f25b783e3b8f4935a60cdff788227c376f
…6515) The AWS landsat data previously used in some S3A tests is no longer accessible This PR moves to the new external file s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz * Large enough file for scale tests * Bucket supports anonymous access * Ends in .gz to keep codec tests happy * No spaces in path to keep bucket-info happy Test Code Changes * Leaves the test key name alone: fs.s3a.scale.test.csvfile * Rename all methods and fields move remove "csv" from their names and move to "external file" we no longer require it to be CSV. * Path definition and helper methods have been moved to PublicDatasetTestUtils * Improve error reporting in ITestS3AInputStreamPerformance if the file is too short This is the V1 SDK version of the patch; it has deleted ITestAWSStatisticCollection as part of the changes. With S3 Select removed, there is no need for the file to be a CSV file; there is a test which tries to unzip it; other tests have a minimum file size. Consult the JIRA for the settings to add to auth-keys.xml to switch earlier builds to this same file. Contributed by Steve Loughran
Moves to new test file/bucket
s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
Leaves the test key name alone: fs.s3a.scale.test.csvfile
How to switch old releases
To retrofit the move to existing branches, here are the XML settings.
Some of the delegation token stuff will still fail but that is something
we will just have to live with.
Some tests will still fail; these have hard-coded references to the old bucket
How was this patch tested?
s3 london with/without prefetch
prefetch test failures in TestS3AOpenCost covered in #6465
Also tested branch-3.4 with the XML settings; all good apart the failures noted above. Once this PR is in trunk I will cherrypick there and branch-3.3
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?