diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md index 7a95907217789..8d00714b85075 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md @@ -111,9 +111,9 @@ Specific buckets can have auditing disabled, even when it is enabled globally. ```xml - fs.s3a.bucket.landsat-pds.audit.enabled + fs.s3a.bucket.noaa-isd-pds.audit.enabled false - Do not audit landsat bucket operations + Do not audit bucket operations ``` @@ -318,9 +318,9 @@ either globally or for specific buckets: - fs.s3a.bucket.landsat-pds.audit.referrer.enabled + fs.s3a.bucket.noaa-isd-pds.audit.referrer.enabled false - Do not add the referrer header to landsat operations + Do not add the referrer header to operations ``` diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md index 4c14921c4b4aa..fb42d507b2d60 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md @@ -747,7 +747,7 @@ For example, for any job executed through Hadoop MapReduce, the Job ID can be us ### `Filesystem does not have support for 'magic' committer` ``` -org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://landsat-pds': Filesystem does not have support for 'magic' committer enabled +org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://noaa-isd-pds': Filesystem does not have support for 'magic' committer enabled in configuration option fs.s3a.committer.magic.enabled ``` @@ -760,42 +760,15 @@ Remove all global/per-bucket declarations of `fs.s3a.bucket.magic.enabled` or se ```xml - fs.s3a.bucket.landsat-pds.committer.magic.enabled + fs.s3a.bucket.noaa-isd-pds.committer.magic.enabled true ``` Tip: you can verify that a bucket supports the magic committer through the -`hadoop s3guard bucket-info` command: +`hadoop s3guard bucket-info` command. -``` -> hadoop s3guard bucket-info -magic s3a://landsat-pds/ -Location: us-west-2 - -S3A Client - Signing Algorithm: fs.s3a.signing-algorithm=(unset) - Endpoint: fs.s3a.endpoint=s3.amazonaws.com - Encryption: fs.s3a.encryption.algorithm=none - Input seek policy: fs.s3a.experimental.input.fadvise=normal - Change Detection Source: fs.s3a.change.detection.source=etag - Change Detection Mode: fs.s3a.change.detection.mode=server - -S3A Committers - The "magic" committer is supported in the filesystem - S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory - S3A Committer name: fs.s3a.committer.name=magic - Store magic committer integration: fs.s3a.committer.magic.enabled=true - -Security - Delegation token support is disabled - -Directory Markers - The directory marker policy is "keep" - Available Policies: delete, keep, authoritative - Authoritative paths: fs.s3a.authoritative.path=``` -``` - ### Error message: "File being created has a magic path, but the filesystem has magic file support disabled" A file is being written to a path which is used for "magic" files, diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md index a31b1c3e39a05..f1839a0b20369 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md @@ -248,14 +248,13 @@ a bucket. The up to date list of regions is [Available online](https://docs.aws.amazon.com/general/latest/gr/s3.html). This list can be used to specify the endpoint of individual buckets, for example -for buckets in the central and EU/Ireland endpoints. +for buckets in the us-west-2 and EU/Ireland endpoints. ```xml - fs.s3a.bucket.landsat-pds.endpoint.region + fs.s3a.bucket.us-west-2-dataset.endpoint.region us-west-2 - The region for s3a://landsat-pds URLs @@ -318,9 +317,9 @@ The boolean option `fs.s3a.endpoint.fips` (default `false`) switches the S3A con For a single bucket: ```xml - fs.s3a.bucket.landsat-pds.endpoint.fips + fs.s3a.bucket.noaa-isd-pds.endpoint.fips true - Use the FIPS endpoint for the landsat dataset + Use the FIPS endpoint for the NOAA dataset ``` diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md index 0ba516313f42d..caa93c46c5ee1 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md @@ -188,7 +188,7 @@ If it was deployed unbonded, the DT Binding is asked to create a new DT. It is up to the binding what it includes in the token identifier, and how it obtains them. This new token identifier is included in a token which has a "canonical service name" of -the URI of the filesystem (e.g "s3a://landsat-pds"). +the URI of the filesystem (e.g "s3a://noaa-isd-pds"). The issued/reissued token identifier can be marshalled and reused. diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md index 7aaa1b8b5ce79..cdba4e3d2c9bd 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md @@ -481,8 +481,8 @@ This will fetch the token and save it to the named file (here, `tokens.bin`), even if Kerberos is disabled. ```bash -# Fetch a token for the AWS landsat-pds bucket and save it to tokens.bin -$ hdfs fetchdt --webservice s3a://landsat-pds/ tokens.bin +# Fetch a token for the AWS noaa-isd-pds bucket and save it to tokens.bin +$ hdfs fetchdt --webservice s3a://noaa-isd-pds/ tokens.bin ``` If the command fails with `ERROR: Failed to fetch token` it means the @@ -498,11 +498,11 @@ host on which it was created. ```bash $ bin/hdfs fetchdt --print tokens.bin -Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://landsat-pds; +Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://noaa-isd-pds; timestamp=1541683947569; encryption=EncryptionSecrets{encryptionMethod=SSE_S3}; Created on vm1.local/192.168.99.1 at time 2018-11-08T13:32:26.381Z.}; Session credentials for user AAABWL expires Thu Nov 08 14:02:27 GMT 2018; (valid)) -for s3a://landsat-pds +for s3a://noaa-isd-pds ``` The "(valid)" annotation means that the AWS credentials are considered "valid": there is both a username and a secret. @@ -513,11 +513,11 @@ If delegation support is enabled, it also prints the current hadoop security level. ```bash -$ hadoop s3guard bucket-info s3a://landsat-pds/ +$ hadoop s3guard bucket-info s3a://noaa-isd-pds/ -Filesystem s3a://landsat-pds +Filesystem s3a://noaa-isd-pds Location: us-west-2 -Filesystem s3a://landsat-pds is not using S3Guard +Filesystem s3a://noaa-isd-pds is not using S3Guard The "magic" committer is not supported S3A Client diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md index a375b0bdb96ea..36e96317a162c 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md @@ -313,9 +313,8 @@ All releases of Hadoop which have been updated to be marker aware will support t Example: `s3guard bucket-info -markers aware` on a compatible release. ``` -> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/ -Filesystem s3a://landsat-pds -Location: us-west-2 +> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/ +Filesystem s3a://noaa-isd-pds ... @@ -325,13 +324,14 @@ Directory Markers Authoritative paths: fs.s3a.authoritative.path= The S3A connector is compatible with buckets where directory markers are not deleted +... ``` The same command will fail on older releases, because the `-markers` option is unknown ``` -> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/ +> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/ Illegal option -markers Usage: hadoop bucket-info [OPTIONS] s3a://BUCKET provide/check information about a specific bucket @@ -353,9 +353,8 @@ Generic options supported are: A specific policy check verifies that the connector is configured as desired ``` -> hadoop s3guard bucket-info -markers keep s3a://landsat-pds/ -Filesystem s3a://landsat-pds -Location: us-west-2 +> hadoop s3guard bucket-info -markers keep s3a://noaa-isd-pds/ +Filesystem s3a://noaa-isd-pds ... @@ -370,9 +369,8 @@ When probing for a specific policy, the error code "46" is returned if the activ does not match that requested: ``` -> hadoop s3guard bucket-info -markers delete s3a://landsat-pds/ -Filesystem s3a://landsat-pds -Location: us-west-2 +> hadoop s3guard bucket-info -markers delete s3a://noaa-isd-pds/ +Filesystem s3a://noaa-isd-pds S3A Client Signing Algorithm: fs.s3a.signing-algorithm=(unset) @@ -397,7 +395,7 @@ Directory Markers Authoritative paths: fs.s3a.authoritative.path= 2021-11-22 16:03:59,175 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - -Exiting with status 46: 46: Bucket s3a://landsat-pds: required marker polic is + -Exiting with status 46: 46: Bucket s3a://noaa-isd-pds: required marker polic is "keep" but actual policy is "delete" ``` @@ -449,10 +447,10 @@ Audit the path and fail if any markers were found. ``` -> hadoop s3guard markers -limit 8000 -audit s3a://landsat-pds/ +> hadoop s3guard markers -limit 8000 -audit s3a://noaa-isd-pds/ -The directory marker policy of s3a://landsat-pds is "Keep" -2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:(77)) - Starting: marker scan s3a://landsat-pds/ +The directory marker policy of s3a://noaa-isd-pds is "Keep" +2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:(77)) - Starting: marker scan s3a://noaa-isd-pds/ Scanned 1,000 objects Scanned 2,000 objects Scanned 3,000 objects @@ -462,8 +460,8 @@ Scanned 6,000 objects Scanned 7,000 objects Scanned 8,000 objects Limit of scan reached - 8,000 objects -2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://landsat-pds/: duration 0:05.107s -No surplus directory markers were found under s3a://landsat-pds/ +2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://noaa-isd-pds/: duration 0:05.107s +No surplus directory markers were found under s3a://noaa-isd-pds/ Listing limit reached before completing the scan 2020-08-05 13:43:01,187 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 3: ``` diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md index 9049440313dd4..a65fc1ecbcedf 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md @@ -536,15 +536,14 @@ header.x-amz-version-id="KcDOVmznIagWx3gP1HlDqcZvm1mFWZ2a" A file with no-encryption (on a bucket without versioning but with intelligent tiering): ``` -bin/hadoop fs -getfattr -d s3a://landsat-pds/scene_list.gz + bin/hadoop fs -getfattr -d s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz -# file: s3a://landsat-pds/scene_list.gz -header.Content-Length="45603307" -header.Content-Type="application/octet-stream" -header.ETag="39c34d489777a595b36d0af5726007db" -header.Last-Modified="Wed Aug 29 01:45:15 BST 2018" -header.x-amz-storage-class="INTELLIGENT_TIERING" -header.x-amz-version-id="null" +# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz +header.Content-Length="524671" +header.Content-Type="binary/octet-stream" +header.ETag=""3e39531220fbd3747d32cf93a79a7a0c"" +header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024" +header.x-amz-server-side-encryption="AES256" ``` ### Use `rename()` to encrypt files with new keys diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md index 0c787de46768f..868ee6ab37e5e 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md @@ -492,7 +492,7 @@ explicitly opened up for broader access. ```bash hadoop fs -ls \ -D fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider \ - s3a://landsat-pds/ + s3a://noaa-isd-pds/ ``` 1. Allowing anonymous access to an S3 bucket compromises @@ -1446,11 +1446,11 @@ a session key: ``` -Finally, the public `s3a://landsat-pds/` bucket can be accessed anonymously: +Finally, the public `s3a://noaa-isd-pds/` bucket can be accessed anonymously: ```xml - fs.s3a.bucket.landsat-pds.aws.credentials.provider + fs.s3a.bucket.noaa-isd-pds.aws.credentials.provider org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider ``` diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md index 45244d9c7814e..28b02470bac1c 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md @@ -405,7 +405,8 @@ An example of this is covered in [HADOOP-13871](https://issues.apache.org/jira/b 1. For public data, use `curl`: - curl -O https://landsat-pds.s3.amazonaws.com/scene_list.gz + curl -O https://noaa-cors-pds.s3.amazonaws.com/raw/2023/001/akse/AKSE001a.23_.gz + 1. Use `nettop` to monitor a processes connections. @@ -654,7 +655,7 @@ via `FileSystem.get()` or `Path.getFileSystem()`. The cache, `FileSystem.CACHE` will, for each user, cachec one instance of a filesystem for a given URI. All calls to `FileSystem.get` for a cached FS for a URI such -as `s3a://landsat-pds/` will return that singe single instance. +as `s3a://noaa-isd-pds/` will return that singe single instance. FileSystem instances are created on-demand for the cache, and will be done in each thread which requests an instance. @@ -678,7 +679,7 @@ can be created simultaneously for different object stores/distributed filesystems. For example, a value of four would put an upper limit on the number -of wasted instantiations of a connector for the `s3a://landsat-pds/` +of wasted instantiations of a connector for the `s3a://noaa-isd-pds/` bucket. ```xml diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md index 53a11404cded3..8840445d2560c 100644 --- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md +++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md @@ -260,22 +260,20 @@ define the target region in `auth-keys.xml`. ### CSV Data Tests The `TestS3AInputStreamPerformance` tests require read access to a multi-MB -text file. The default file for these tests is one published by amazon, -[s3a://landsat-pds.s3.amazonaws.com/scene_list.gz](http://landsat-pds.s3.amazonaws.com/scene_list.gz). -This is a gzipped CSV index of other files which amazon serves for open use. +text file. The default file for these tests is a public one. +`s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz` +from the [NOAA Continuously Operating Reference Stations (CORS) Network (NCN)](https://registry.opendata.aws/noaa-ncn/) Historically it was required to be a `csv.gz` file to validate S3 Select support. Now that S3 Select support has been removed, other large files may be used instead. -However, future versions may want to read a CSV file again, so testers -should still reference one. The path to this object is set in the option `fs.s3a.scale.test.csvfile`, ```xml fs.s3a.scale.test.csvfile - s3a://landsat-pds/scene_list.gz + s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz ``` @@ -285,6 +283,7 @@ is hosted in Amazon's US-east datacenter. 1. If the data cannot be read for any reason then the test will fail. 1. If the property is set to a different path, then that data must be readable and "sufficiently" large. +1. If a `.gz` file, expect decompression-related test failures. (the reason the space or newline is needed is to add "an empty entry"; an empty `` would be considered undefined and pick up the default) @@ -292,14 +291,13 @@ and "sufficiently" large. If using a test file in a different AWS S3 region then a bucket-specific region must be defined. -For the default test dataset, hosted in the `landsat-pds` bucket, this is: +For the default test dataset, hosted in the `noaa-cors-pds` bucket, this is: ```xml - - fs.s3a.bucket.landsat-pds.endpoint.region - us-west-2 - The region for s3a://landsat-pds - + + fs.s3a.bucket.noaa-cors-pds.endpoint.region + us-east-1 + ``` ### Testing Access Point Integration @@ -825,7 +823,7 @@ the tests become skipped, rather than fail with a trace which is really a false The ordered test case mechanism of `AbstractSTestS3AHugeFiles` is probably the most elegant way of chaining test setup/teardown. -Regarding reusing existing data, we tend to use the landsat archive of +Regarding reusing existing data, we tend to use the noaa-cors-pds archive of AWS US-East for our testing of input stream operations. This doesn't work against other regions, or with third party S3 implementations. Thus the URL can be overridden for testing elsewhere. diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java index c13c3f48b8466..9a880db25eedc 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java @@ -39,10 +39,10 @@ import org.slf4j.LoggerFactory; import static org.apache.hadoop.fs.s3a.Constants.*; -import static org.apache.hadoop.fs.s3a.S3ATestUtils.getCSVTestPath; import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides; import static org.apache.hadoop.fs.s3a.S3AUtils.*; import static org.apache.hadoop.fs.s3a.auth.delegation.DelegationConstants.DELEGATION_TOKEN_BINDING; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData; import static org.junit.Assert.*; /** @@ -162,7 +162,7 @@ public void testAnonymousProvider() throws Exception { Configuration conf = new Configuration(); conf.set(AWS_CREDENTIALS_PROVIDER, AnonymousAWSCredentialsProvider.class.getName()); - Path testFile = getCSVTestPath(conf); + Path testFile = getExternalData(conf); try (FileSystem fs = FileSystem.newInstance(testFile.toUri(), conf)) { assertNotNull("S3AFileSystem instance must not be null", fs); assertTrue("FileSystem must be the instance of S3AFileSystem", fs instanceof S3AFileSystem); diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java index c0f6a4b23226b..9e40534c82bd5 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java @@ -21,7 +21,6 @@ import com.amazonaws.services.s3.model.DeleteObjectsRequest; import org.assertj.core.api.Assertions; -import org.junit.Assume; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.LocatedFileStatus; @@ -42,6 +41,7 @@ import static org.apache.hadoop.fs.contract.ContractTestUtils.*; import static org.apache.hadoop.fs.s3a.S3ATestUtils.createFiles; import static org.apache.hadoop.fs.s3a.test.ExtraAssertions.failIf; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireDefaultExternalData; import static org.apache.hadoop.test.LambdaTestUtils.*; import static org.apache.hadoop.util.functional.RemoteIterators.mappingRemoteIterator; import static org.apache.hadoop.util.functional.RemoteIterators.toList; @@ -135,22 +135,13 @@ public void testMultiObjectDeleteSomeFiles() throws Throwable { timer.end("removeKeys"); } - - private Path maybeGetCsvPath() { - Configuration conf = getConfiguration(); - String csvFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE); - Assume.assumeTrue("CSV test file is not the default", - DEFAULT_CSVTEST_FILE.equals(csvFile)); - return new Path(csvFile); - } - /** * Test low-level failure handling with low level delete request. */ @Test public void testMultiObjectDeleteNoPermissions() throws Throwable { - describe("Delete the landsat CSV file and expect it to fail"); - Path csvPath = maybeGetCsvPath(); + describe("Delete the external file and expect it to fail"); + Path csvPath = requireDefaultExternalData(getConfiguration()); S3AFileSystem fs = (S3AFileSystem) csvPath.getFileSystem( getConfiguration()); // create a span, expect it to be activated. @@ -170,8 +161,8 @@ public void testMultiObjectDeleteNoPermissions() throws Throwable { */ @Test public void testSingleObjectDeleteNoPermissionsTranslated() throws Throwable { - describe("Delete the landsat CSV file and expect it to fail"); - Path csvPath = maybeGetCsvPath(); + describe("Delete the external file and expect it to fail"); + Path csvPath = requireDefaultExternalData(getConfiguration()); S3AFileSystem fs = (S3AFileSystem) csvPath.getFileSystem( getConfiguration()); AccessDeniedException aex = intercept(AccessDeniedException.class, diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java index 57f7686b62082..274eab44b71f1 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java @@ -19,8 +19,9 @@ package org.apache.hadoop.fs.s3a; import java.io.File; -import java.net.URI; +import java.util.UUID; +import org.assertj.core.api.Assertions; import org.junit.Before; import org.junit.Test; import org.slf4j.Logger; @@ -30,15 +31,16 @@ import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.LocalFileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.contract.ContractTestUtils; import org.apache.hadoop.fs.permission.FsAction; import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest; import static org.apache.hadoop.fs.s3a.Constants.BUFFER_DIR; -import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE; import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY; import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData; import static org.apache.hadoop.io.IOUtils.cleanupWithLogger; /** @@ -49,11 +51,21 @@ public class ITestS3APrefetchingCacheFiles extends AbstractS3ACostTest { private static final Logger LOG = LoggerFactory.getLogger(ITestS3APrefetchingCacheFiles.class); + /** use a small file size so small source files will still work. */ + public static final int BLOCK_SIZE = 128 * 1024; + + public static final int PREFETCH_OFFSET = 10240; + private Path testFile; + + /** The FS with the external file. */ private FileSystem fs; + private int prefetchBlockSize; private Configuration conf; + private String bufferDir; + public ITestS3APrefetchingCacheFiles() { super(true); } @@ -63,35 +75,31 @@ public void setUp() throws Exception { super.setup(); // Sets BUFFER_DIR by calling S3ATestUtils#prepareTestConfiguration conf = createConfiguration(); - String testFileUri = S3ATestUtils.getCSVTestFile(conf); - testFile = new Path(testFileUri); - prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE); - fs = getFileSystem(); - fs.initialize(new URI(testFileUri), conf); + testFile = getExternalData(conf); + prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, BLOCK_SIZE); + fs = FileSystem.get(testFile.toUri(), conf); } @Override public Configuration createConfiguration() { Configuration configuration = super.createConfiguration(); S3ATestUtils.removeBaseAndBucketOverrides(configuration, PREFETCH_ENABLED_KEY); - S3ATestUtils.removeBaseAndBucketOverrides(configuration, PREFETCH_BLOCK_SIZE_KEY); configuration.setBoolean(PREFETCH_ENABLED_KEY, true); + // use a small block size unless explicitly set in the test config. + configuration.setInt(PREFETCH_BLOCK_SIZE_KEY, BLOCK_SIZE); + // patch buffer dir with a unique path for test isolation. + final String bufferDirBase = configuration.get(BUFFER_DIR); + bufferDir = bufferDirBase + "/" + UUID.randomUUID(); + configuration.set(BUFFER_DIR, bufferDir); return configuration; } @Override public synchronized void teardown() throws Exception { super.teardown(); - File tmpFileDir = new File(conf.get(BUFFER_DIR)); - File[] tmpFiles = tmpFileDir.listFiles(); - if (tmpFiles != null) { - for (File filePath : tmpFiles) { - String path = filePath.getPath(); - if (path.endsWith(".bin") && path.contains("fs-cache-")) { - filePath.delete(); - } - } + if (bufferDir != null) { + new File(bufferDir).delete(); } cleanupWithLogger(LOG, fs); fs = null; @@ -110,34 +118,35 @@ public void testCacheFileExistence() throws Throwable { try (FSDataInputStream in = fs.open(testFile)) { byte[] buffer = new byte[prefetchBlockSize]; - in.read(buffer, 0, prefetchBlockSize - 10240); - in.seek(prefetchBlockSize * 2); - in.read(buffer, 0, prefetchBlockSize); + // read a bit less than a block + in.readFully(0, buffer, 0, prefetchBlockSize - PREFETCH_OFFSET); + // read at least some of a second block + in.read(prefetchBlockSize * 2, buffer, 0, prefetchBlockSize); + File tmpFileDir = new File(conf.get(BUFFER_DIR)); - assertTrue("The dir to keep cache files must exist", tmpFileDir.exists()); + final LocalFileSystem localFs = FileSystem.getLocal(conf); + Path bufferDirPath = new Path(tmpFileDir.toURI()); + ContractTestUtils.assertIsDirectory(localFs, bufferDirPath); File[] tmpFiles = tmpFileDir .listFiles((dir, name) -> name.endsWith(".bin") && name.contains("fs-cache-")); - boolean isCacheFileForBlockFound = tmpFiles != null && tmpFiles.length > 0; - if (!isCacheFileForBlockFound) { - LOG.warn("No cache files found under " + tmpFileDir); - } - assertTrue("File to cache block data must exist", isCacheFileForBlockFound); + Assertions.assertThat(tmpFiles) + .describedAs("Cache files not found under %s", tmpFileDir) + .isNotEmpty(); + for (File tmpFile : tmpFiles) { Path path = new Path(tmpFile.getAbsolutePath()); - try (FileSystem localFs = FileSystem.getLocal(conf)) { - FileStatus stat = localFs.getFileStatus(path); - ContractTestUtils.assertIsFile(path, stat); - assertEquals("File length not matching with prefetchBlockSize", prefetchBlockSize, - stat.getLen()); - assertEquals("User permissions should be RW", FsAction.READ_WRITE, - stat.getPermission().getUserAction()); - assertEquals("Group permissions should be NONE", FsAction.NONE, - stat.getPermission().getGroupAction()); - assertEquals("Other permissions should be NONE", FsAction.NONE, - stat.getPermission().getOtherAction()); - } + FileStatus stat = localFs.getFileStatus(path); + ContractTestUtils.assertIsFile(path, stat); + assertEquals("File length not matching with prefetchBlockSize", prefetchBlockSize, + stat.getLen()); + assertEquals("User permissions should be RW", FsAction.READ_WRITE, + stat.getPermission().getUserAction()); + assertEquals("Group permissions should be NONE", FsAction.NONE, + stat.getPermission().getGroupAction()); + assertEquals("Other permissions should be NONE", FsAction.NONE, + stat.getPermission().getOtherAction()); } } } diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java index a6269c437665a..50f58c248acf0 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java @@ -96,14 +96,16 @@ public interface S3ATestConstants { String KEY_CSVTEST_FILE = S3A_SCALE_TEST + "csvfile"; /** - * The landsat bucket: {@value}. + * Default path for the multi MB test file: {@value}. + * @deprecated retrieve via {@link PublicDatasetTestUtils}. */ - String LANDSAT_BUCKET = "s3a://landsat-pds/"; + @Deprecated + String DEFAULT_CSVTEST_FILE = PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE; /** - * Default path for the multi MB test file: {@value}. + * Example path for unit tests; this is never accessed: {@value}. */ - String DEFAULT_CSVTEST_FILE = LANDSAT_BUCKET + "scene_list.gz"; + String UNIT_TEST_EXAMPLE_PATH = "s3a://example/data/"; /** * Configuration key for an existing object in a requester pays bucket: {@value}. diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java index 469562f9b33b9..9d2a6829f9d66 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java @@ -88,6 +88,8 @@ import static org.apache.hadoop.fs.contract.ContractTestUtils.createFile; import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.submit; import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.waitForCompletion; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireDefaultExternalDataFile; import static org.apache.hadoop.test.GenericTestUtils.buildPaths; import static org.apache.hadoop.util.Preconditions.checkNotNull; import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_CREDENTIAL_PROVIDER_PATH; @@ -386,22 +388,22 @@ public static String getTestProperty(Configuration conf, * Get the test CSV file; assume() that it is not empty. * @param conf test configuration * @return test file. + * @deprecated Retained only to assist cherrypicking patches */ + @Deprecated public static String getCSVTestFile(Configuration conf) { - String csvFile = conf - .getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE); - Assume.assumeTrue("CSV test file is not the default", - isNotEmpty(csvFile)); - return csvFile; + return getExternalData(conf).toUri().toString(); } /** * Get the test CSV path; assume() that it is not empty. * @param conf test configuration * @return test file as a path. + * @deprecated Retained only to assist cherrypicking patches */ + @Deprecated public static Path getCSVTestPath(Configuration conf) { - return new Path(getCSVTestFile(conf)); + return getExternalData(conf); } /** @@ -410,12 +412,11 @@ public static Path getCSVTestPath(Configuration conf) { * read only). * @return test file. * @param conf test configuration + * @deprecated Retained only to assist cherrypicking patches */ + @Deprecated public static String getLandsatCSVFile(Configuration conf) { - String csvFile = getCSVTestFile(conf); - Assume.assumeTrue("CSV test file is not the default", - DEFAULT_CSVTEST_FILE.equals(csvFile)); - return csvFile; + return requireDefaultExternalDataFile(conf); } /** * Get the test CSV file; assume() that it is not modified (i.e. we haven't @@ -423,9 +424,11 @@ public static String getLandsatCSVFile(Configuration conf) { * read only). * @param conf test configuration * @return test file as a path. + * @deprecated Retained only to assist cherrypicking patches */ + @Deprecated public static Path getLandsatCSVPath(Configuration conf) { - return new Path(getLandsatCSVFile(conf)); + return getExternalData(conf); } /** diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java index 730bae0aeb101..9312b3a552144 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java @@ -46,26 +46,27 @@ import org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider; import org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider; import org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException; +import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils; import org.apache.hadoop.io.retry.RetryPolicy; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData; import static org.apache.hadoop.fs.s3a.Constants.*; -import static org.apache.hadoop.fs.s3a.S3ATestConstants.*; import static org.apache.hadoop.fs.s3a.S3ATestUtils.*; import static org.apache.hadoop.fs.s3a.S3AUtils.*; import static org.apache.hadoop.test.LambdaTestUtils.intercept; import static org.apache.hadoop.test.LambdaTestUtils.interceptFuture; -import static org.junit.Assert.*; /** * Unit tests for {@link Constants#AWS_CREDENTIALS_PROVIDER} logic. */ -public class TestS3AAWSCredentialsProvider { +public class TestS3AAWSCredentialsProvider extends AbstractS3ATestBase { /** - * URI of the landsat images. + * URI of the test file: this must be anonymously accessible. + * As these are unit tests no actual connection to the store is made. */ private static final URI TESTFILE_URI = new Path( - DEFAULT_CSVTEST_FILE).toUri(); + PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE).toUri(); @Rule public ExpectedException exception = ExpectedException.none(); @@ -110,7 +111,7 @@ public void testInstantiationChain() throws Throwable { TemporaryAWSCredentialsProvider.NAME + ", \t" + SimpleAWSCredentialsProvider.NAME + " ,\n " + AnonymousAWSCredentialsProvider.NAME); - Path testFile = getCSVTestPath(conf); + Path testFile = getExternalData(conf); AWSCredentialProviderList list = createAWSCredentialProviderSet( testFile.toUri(), conf); @@ -522,7 +523,7 @@ protected AWSCredentials createCredentials(Configuration config) throws IOExcept @Test public void testConcurrentAuthentication() throws Throwable { Configuration conf = createProviderConfiguration(SlowProvider.class.getName()); - Path testFile = getCSVTestPath(conf); + Path testFile = getExternalData(conf); AWSCredentialProviderList list = createAWSCredentialProviderSet(testFile.toUri(), conf); @@ -592,7 +593,7 @@ protected AWSCredentials createCredentials(Configuration config) throws IOExcept @Test public void testConcurrentAuthenticationError() throws Throwable { Configuration conf = createProviderConfiguration(ErrorProvider.class.getName()); - Path testFile = getCSVTestPath(conf); + Path testFile = getExternalData(conf); AWSCredentialProviderList list = createAWSCredentialProviderSet(testFile.toUri(), conf); ErrorProvider provider = (ErrorProvider) list.getProviders().get(0); diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java index 9fb09b4cede52..20f595543255e 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java @@ -44,7 +44,6 @@ import org.apache.hadoop.fs.s3a.AbstractS3ATestBase; import org.apache.hadoop.fs.s3a.MultipartUtils; import org.apache.hadoop.fs.s3a.S3AFileSystem; -import org.apache.hadoop.fs.s3a.S3ATestConstants; import org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider; import org.apache.hadoop.fs.s3a.commit.CommitConstants; import org.apache.hadoop.fs.s3a.commit.files.PendingSet; @@ -64,6 +63,7 @@ import static org.apache.hadoop.fs.s3a.auth.RoleTestUtils.forbidden; import static org.apache.hadoop.fs.s3a.auth.RoleTestUtils.newAssumedRoleConfig; import static org.apache.hadoop.fs.s3a.s3guard.S3GuardToolTestHelper.exec; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath; import static org.apache.hadoop.fs.statistics.IOStatisticsLogging.ioStatisticsSourceToString; import static org.apache.hadoop.io.IOUtils.cleanupWithLogger; import static org.apache.hadoop.test.GenericTestUtils.assertExceptionContains; @@ -104,7 +104,7 @@ public class ITestAssumeRole extends AbstractS3ATestBase { public void setup() throws Exception { super.setup(); assumeRoleTests(); - uri = new URI(S3ATestConstants.DEFAULT_CSVTEST_FILE); + uri = requireAnonymousDataPath(getConfiguration()).toUri(); } @Override diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java index d5d62f2cae92c..ba9746358c575 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java @@ -58,6 +58,8 @@ import static org.apache.hadoop.fs.s3a.auth.delegation.DelegationConstants.*; import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.assertSecurityEnabled; import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.closeUserFileSystems; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getOrcData; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath; /** * Submit a job with S3 delegation tokens. @@ -106,10 +108,17 @@ public class ITestDelegatedMRJob extends AbstractDelegationIT { private Path destPath; - private static final Path EXTRA_JOB_RESOURCE_PATH - = new Path("s3a://osm-pds/planet/planet-latest.orc"); + /** + * Path of the extra job resource; set up in + * {@link #createConfiguration()}. + */ + private Path extraJobResourcePath; - public static final URI jobResource = EXTRA_JOB_RESOURCE_PATH.toUri(); + /** + * URI of the extra job resource; set up in + * {@link #createConfiguration()}. + */ + private URI jobResourceUri; /** * Test array for parameterized test runs. @@ -161,7 +170,9 @@ protected YarnConfiguration createConfiguration() { conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS, 10_000); - String host = jobResource.getHost(); + extraJobResourcePath = getOrcData(conf); + jobResourceUri = extraJobResourcePath.toUri(); + String host = jobResourceUri.getHost(); // and fix to the main endpoint if the caller has moved conf.set( String.format("fs.s3a.bucket.%s.endpoint", host), ""); @@ -229,9 +240,9 @@ protected int getTestTimeoutMillis() { @Test public void testCommonCrawlLookup() throws Throwable { - FileSystem resourceFS = EXTRA_JOB_RESOURCE_PATH.getFileSystem( + FileSystem resourceFS = extraJobResourcePath.getFileSystem( getConfiguration()); - FileStatus status = resourceFS.getFileStatus(EXTRA_JOB_RESOURCE_PATH); + FileStatus status = resourceFS.getFileStatus(extraJobResourcePath); LOG.info("Extra job resource is {}", status); assertTrue("Not encrypted: " + status, status.isEncrypted()); } @@ -241,9 +252,9 @@ public void testJobSubmissionCollectsTokens() throws Exception { describe("Mock Job test"); JobConf conf = new JobConf(getConfiguration()); - // the input here is the landsat file; which lets + // the input here is the external file; which lets // us differentiate source URI from dest URI - Path input = new Path(DEFAULT_CSVTEST_FILE); + Path input = requireAnonymousDataPath(getConfiguration()); final FileSystem sourceFS = input.getFileSystem(conf); @@ -272,7 +283,7 @@ public void testJobSubmissionCollectsTokens() throws Exception { // This is to actually stress the terasort code for which // the yarn ResourceLocalizationService was having problems with // fetching resources from. - URI partitionUri = new URI(EXTRA_JOB_RESOURCE_PATH.toString() + + URI partitionUri = new URI(extraJobResourcePath.toString() + "#_partition.lst"); job.addCacheFile(partitionUri); @@ -302,7 +313,7 @@ public void testJobSubmissionCollectsTokens() throws Exception { // look up the destination token lookupToken(submittedCredentials, fs.getUri(), tokenKind); lookupToken(submittedCredentials, - EXTRA_JOB_RESOURCE_PATH.getFileSystem(conf).getUri(), tokenKind); + extraJobResourcePath.getFileSystem(conf).getUri(), tokenKind); } } diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java index 511b813475954..08dba4b798214 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java @@ -53,8 +53,7 @@ public Text getTokenKind() { /** * This verifies that the granted credentials only access the target bucket - * by using the credentials in a new S3 client to query the AWS-owned landsat - * bucket. + * by using the credentials in a new S3 client to query the public data bucket. * @param delegatedFS delegated FS with role-restricted access. * @throws Exception failure */ @@ -62,7 +61,7 @@ public Text getTokenKind() { protected void verifyRestrictedPermissions(final S3AFileSystem delegatedFS) throws Exception { intercept(AccessDeniedException.class, - () -> readLandsatMetadata(delegatedFS)); + () -> readExternalDatasetMetadata(delegatedFS)); } } diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java index 295125169a00c..35300980959ee 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java @@ -76,6 +76,7 @@ import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.ALICE; import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.assertSecurityEnabled; import static org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.lookupS3ADelegationToken; +import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath; import static org.apache.hadoop.test.LambdaTestUtils.doAs; import static org.apache.hadoop.test.LambdaTestUtils.intercept; import static org.hamcrest.Matchers.containsString; @@ -330,8 +331,10 @@ public void testDelegatedFileSystem() throws Throwable { + " if role restricted, permissions are tightened."); S3AFileSystem fs = getFileSystem(); // force a probe of the remote FS to make sure its endpoint is valid - fs.getObjectMetadata(new Path("/")); - readLandsatMetadata(fs); + // TODO: Check what should happen here. Calling headObject() on the root path fails in V2, + // with the error that key cannot be empty. + // fs.getObjectMetadata(new Path("/")); + readExternalDatasetMetadata(fs); URI uri = fs.getUri(); // create delegation tokens from the test suites FS. @@ -450,13 +453,13 @@ protected void executeDelegatedFSOperations(final S3AFileSystem delegatedFS, } /** - * Session tokens can read the landsat bucket without problems. + * Session tokens can read the external bucket without problems. * @param delegatedFS delegated FS * @throws Exception failure */ protected void verifyRestrictedPermissions(final S3AFileSystem delegatedFS) throws Exception { - readLandsatMetadata(delegatedFS); + readExternalDatasetMetadata(delegatedFS); } @Test @@ -569,7 +572,7 @@ public void testDelegationBindingMismatch2() throws Throwable { /** * This verifies that the granted credentials only access the target bucket - * by using the credentials in a new S3 client to query the AWS-owned landsat + * by using the credentials in a new S3 client to query the external * bucket. * @param delegatedFS delegated FS with role-restricted access. * @throws AccessDeniedException if the delegated FS's credentials can't @@ -578,16 +581,17 @@ public void testDelegationBindingMismatch2() throws Throwable { * @throws Exception failure */ @SuppressWarnings("deprecation") - protected ObjectMetadata readLandsatMetadata(final S3AFileSystem delegatedFS) + protected ObjectMetadata readExternalDatasetMetadata(final S3AFileSystem delegatedFS) throws Exception { AWSCredentialProviderList testingCreds = delegatedFS.shareCredentials("testing"); - URI landsat = new URI(DEFAULT_CSVTEST_FILE); + URI external = requireAnonymousDataPath(getConfiguration()).toUri(); DefaultS3ClientFactory factory = new DefaultS3ClientFactory(); - factory.setConf(new Configuration(delegatedFS.getConf())); - String host = landsat.getHost(); + Configuration conf = delegatedFS.getConf(); + factory.setConf(conf); + String host = external.getHost(); S3ClientFactory.S3ClientCreationParameters parameters = null; parameters = new S3ClientFactory.S3ClientCreationParameters() .withCredentialSet(testingCreds) @@ -596,10 +600,10 @@ protected ObjectMetadata readLandsatMetadata(final S3AFileSystem delegatedFS) .withMetrics(new EmptyS3AStatisticsContext() .newStatisticsFromAwsSdk()) .withUserAgentSuffix("ITestSessionDelegationInFilesystem"); - AmazonS3 s3 = factory.createS3Client(landsat, parameters); + AmazonS3 s3 = factory.createS3Client(external, parameters); return Invoker.once("HEAD", host, - () -> s3.getObjectMetadata(host, landsat.getPath().substring(1))); + () -> s3.getObjectMetadata(host, external.getPath().substring(1))); } /** diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java index 88d9ebfcdfdc3..ea4e2e54ff6ee 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java @@ -24,10 +24,10 @@ import org.junit.Test; import org.apache.hadoop.fs.s3a.S3AEncryptionMethods; -import org.apache.hadoop.fs.s3a.S3ATestConstants; import org.apache.hadoop.fs.s3a.S3ATestUtils; import org.apache.hadoop.fs.s3a.auth.MarshalledCredentialBinding; import org.apache.hadoop.fs.s3a.auth.MarshalledCredentials; +import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils; import org.apache.hadoop.io.Text; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.security.token.SecretManager; @@ -45,11 +45,11 @@ */ public class TestS3ADelegationTokenSupport { - private static URI landsatUri; + private static URI externalUri; @BeforeClass public static void classSetup() throws Exception { - landsatUri = new URI(S3ATestConstants.DEFAULT_CSVTEST_FILE); + externalUri = new URI(PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE); } @Test @@ -75,7 +75,7 @@ public void testSessionTokenDecode() throws Throwable { = new SessionTokenIdentifier(SESSION_TOKEN_KIND, alice, renewer, - new URI("s3a://landsat-pds/"), + new URI("s3a://anything/"), new MarshalledCredentials("a", "b", ""), new EncryptionSecrets(S3AEncryptionMethods.SSE_S3, ""), "origin"); @@ -117,7 +117,7 @@ public void testSessionTokenIdentifierRoundTrip() throws Throwable { SESSION_TOKEN_KIND, new Text(), renewer, - landsatUri, + externalUri, new MarshalledCredentials("a", "b", "c"), new EncryptionSecrets(), ""); @@ -136,7 +136,7 @@ public void testSessionTokenIdentifierRoundTripNoRenewer() throws Throwable { SESSION_TOKEN_KIND, new Text(), null, - landsatUri, + externalUri, new MarshalledCredentials("a", "b", "c"), new EncryptionSecrets(), ""); @@ -152,7 +152,7 @@ public void testSessionTokenIdentifierRoundTripNoRenewer() throws Throwable { @Test public void testRoleTokenIdentifierRoundTrip() throws Throwable { RoleTokenIdentifier id = new RoleTokenIdentifier( - landsatUri, + externalUri, new Text(), new Text(), new MarshalledCredentials("a", "b", "c"), @@ -171,7 +171,7 @@ public void testRoleTokenIdentifierRoundTrip() throws Throwable { public void testFullTokenIdentifierRoundTrip() throws Throwable { Text renewer = new Text("renewerName"); FullCredentialsTokenIdentifier id = new FullCredentialsTokenIdentifier( - landsatUri, + externalUri, new Text(), renewer, new MarshalledCredentials("a", "b", ""), diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java index ee6480a36af37..e2582cd0a2ee8 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java @@ -26,6 +26,7 @@ import org.apache.hadoop.fs.Path; import org.apache.hadoop.test.HadoopTestBase; +import static org.apache.hadoop.fs.s3a.S3ATestConstants.UNIT_TEST_EXAMPLE_PATH; import static org.apache.hadoop.fs.s3a.commit.staging.Paths.*; import static org.apache.hadoop.test.LambdaTestUtils.intercept; @@ -81,7 +82,7 @@ private void assertUUIDAdded(String path, String expected) { assertEquals("from " + path, expected, addUUID(path, "UUID")); } - private static final String DATA = "s3a://landsat-pds/data/"; + private static final String DATA = UNIT_TEST_EXAMPLE_PATH; private static final Path BASE = new Path(DATA); @Test diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java index 23b14fd3792ca..c7bd892872f62 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java @@ -22,14 +22,17 @@ import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.InputStreamReader; +import java.net.URI; import java.util.ArrayList; import java.util.Arrays; import java.util.List; import org.junit.Test; +import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.s3a.S3AFileSystem; +import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils; import org.apache.hadoop.test.LambdaTestUtils; import org.apache.hadoop.util.StringUtils; @@ -37,7 +40,6 @@ import static org.apache.hadoop.fs.s3a.MultipartTestUtils.clearAnyUploads; import static org.apache.hadoop.fs.s3a.MultipartTestUtils.countUploadsAt; import static org.apache.hadoop.fs.s3a.MultipartTestUtils.createPartUpload; -import static org.apache.hadoop.fs.s3a.S3ATestUtils.getLandsatCSVFile; import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.BucketInfo; import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.E_BAD_STATE; import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.Uploads; @@ -53,35 +55,31 @@ public class ITestS3GuardTool extends AbstractS3GuardToolTestBase { "-force", "-verbose"}; @Test - public void testLandsatBucketUnguarded() throws Throwable { + public void testExternalBucketRequireUnencrypted() throws Throwable { run(BucketInfo.NAME, - "-" + BucketInfo.UNGUARDED_FLAG, - getLandsatCSVFile(getConfiguration())); - } - - @Test - public void testLandsatBucketRequireGuarded() throws Throwable { - runToFailure(E_BAD_STATE, - BucketInfo.NAME, - "-" + BucketInfo.GUARDED_FLAG, - getLandsatCSVFile( - ITestS3GuardTool.this.getConfiguration())); + "-" + BucketInfo.ENCRYPTION_FLAG, "none", + externalBucket()); } - @Test - public void testLandsatBucketRequireUnencrypted() throws Throwable { - run(BucketInfo.NAME, - "-" + BucketInfo.ENCRYPTION_FLAG, "none", - getLandsatCSVFile(getConfiguration())); + /** + * Get the external bucket; this is of the default external file. + * If not set to the default value, the test will be skipped. + * @return the bucket of the default external file. + */ + private String externalBucket() { + Configuration conf = getConfiguration(); + Path result = PublicDatasetTestUtils.requireDefaultExternalData(conf); + final URI uri = result.toUri(); + final String bucket = uri.getScheme() + "://" + uri.getHost(); + return bucket; } @Test - public void testLandsatBucketRequireEncrypted() throws Throwable { + public void testExternalBucketRequireEncrypted() throws Throwable { runToFailure(E_BAD_STATE, BucketInfo.NAME, "-" + BucketInfo.ENCRYPTION_FLAG, - "AES256", getLandsatCSVFile( - ITestS3GuardTool.this.getConfiguration())); + "AES256", externalBucket()); } @Test diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java index c8e56f753bd50..95bb5a567f719 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java @@ -33,6 +33,7 @@ import org.apache.hadoop.test.AbstractHadoopTestBase; import static org.apache.hadoop.fs.s3a.Constants.AUTHORITATIVE_PATH; +import static org.apache.hadoop.fs.s3a.S3ATestConstants.UNIT_TEST_EXAMPLE_PATH; import static org.assertj.core.api.Assertions.assertThat; /** @@ -71,7 +72,7 @@ public void testResolutionWithFQP() throws Throwable { @Test public void testOtherBucket() throws Throwable { assertAuthPaths(l("/one/", - "s3a://landsat-pds/", + UNIT_TEST_EXAMPLE_PATH, BASE + "/two/"), "/one/", "/two/"); } @@ -79,7 +80,7 @@ public void testOtherBucket() throws Throwable { @Test public void testOtherScheme() throws Throwable { assertAuthPaths(l("/one/", - "s3a://landsat-pds/", + UNIT_TEST_EXAMPLE_PATH, "http://bucket/two/"), "/one/"); } diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java index fb9988b29a5c4..ae09452372316 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java @@ -30,6 +30,7 @@ import org.apache.hadoop.fs.s3a.S3AInputStream; import org.apache.hadoop.fs.s3a.S3ATestUtils; import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics; +import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils; import org.apache.hadoop.fs.statistics.IOStatistics; import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot; import org.apache.hadoop.fs.statistics.MeanStatistic; @@ -112,7 +113,9 @@ public void openFS() throws IOException { Configuration conf = getConf(); conf.setInt(SOCKET_SEND_BUFFER, 16 * 1024); conf.setInt(SOCKET_RECV_BUFFER, 16 * 1024); - String testFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE); + // look up the test file, no requirement to be set. + String testFile = conf.getTrimmed(KEY_CSVTEST_FILE, + PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE); if (testFile.isEmpty()) { assumptionMessage = "Empty test property: " + KEY_CSVTEST_FILE; LOG.warn(assumptionMessage); @@ -394,6 +397,9 @@ private void executeDecompression(long readahead, CompressionCodecFactory factory = new CompressionCodecFactory(getConf()); CompressionCodec codec = factory.getCodec(testData); + Assertions.assertThat(codec) + .describedAs("No codec found for %s", testData) + .isNotNull(); long bytesRead = 0; int lines = 0; @@ -525,12 +531,18 @@ private ContractTestUtils.NanoTimer executeRandomIO(S3AInputPolicy policy, describe("Random IO with policy \"%s\"", policy); byte[] buffer = new byte[_1MB]; long totalBytesRead = 0; - + final long len = testDataStatus.getLen(); in = openTestFile(policy, 0); ContractTestUtils.NanoTimer timer = new ContractTestUtils.NanoTimer(); for (int[] action : RANDOM_IO_SEQUENCE) { - int position = action[0]; + long position = action[0]; int range = action[1]; + // if a read goes past EOF, fail with details + // this will happen if the test datafile is too small. + Assertions.assertThat(position + range) + .describedAs("readFully(pos=%d range=%d) of %s", + position, range, testDataStatus) + .isLessThanOrEqualTo(len); in.readFully(position, buffer, 0, range); totalBytesRead += range; } diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/statistics/ITestAWSStatisticCollection.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/statistics/ITestAWSStatisticCollection.java deleted file mode 100644 index e7696996dbd1a..0000000000000 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/statistics/ITestAWSStatisticCollection.java +++ /dev/null @@ -1,82 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.hadoop.fs.s3a.statistics; - -import org.junit.Test; - -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.Path; -import org.apache.hadoop.fs.s3a.AbstractS3ATestBase; -import org.apache.hadoop.fs.s3a.S3AFileSystem; -import org.apache.hadoop.fs.statistics.IOStatistics; - -import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT; -import static org.apache.hadoop.fs.s3a.Constants.ENDPOINT; -import static org.apache.hadoop.fs.s3a.S3ATestUtils.getLandsatCSVPath; -import static org.apache.hadoop.fs.s3a.Statistic.STORE_IO_REQUEST; -import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.assertThatStatisticCounter; - -/** - * Verify that AWS SDK statistics are wired up. - * This test tries to read data from US-east-1 and us-west-2 buckets - * so as to be confident that the nuances of region mapping - * are handed correctly (HADOOP-13551). - * The statistics are probed to verify that the wiring up is complete. - */ -public class ITestAWSStatisticCollection extends AbstractS3ATestBase { - - private static final Path COMMON_CRAWL_PATH - = new Path("s3a://osm-pds/planet/planet-latest.orc"); - - @Test - public void testLandsatStatistics() throws Throwable { - final Configuration conf = getConfiguration(); - // skips the tests if the landsat path isn't the default. - Path path = getLandsatCSVPath(conf); - conf.set(ENDPOINT, DEFAULT_ENDPOINT); - conf.unset("fs.s3a.bucket.landsat-pds.endpoint"); - - try (S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(conf)) { - fs.getObjectMetadata(path); - IOStatistics iostats = fs.getIOStatistics(); - assertThatStatisticCounter(iostats, - STORE_IO_REQUEST.getSymbol()) - .isGreaterThanOrEqualTo(1); - } - } - - @Test - public void testCommonCrawlStatistics() throws Throwable { - final Configuration conf = getConfiguration(); - // skips the tests if the landsat path isn't the default. - getLandsatCSVPath(conf); - - Path path = COMMON_CRAWL_PATH; - conf.set(ENDPOINT, DEFAULT_ENDPOINT); - - try (S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(conf)) { - fs.getObjectMetadata(path); - IOStatistics iostats = fs.getIOStatistics(); - assertThatStatisticCounter(iostats, - STORE_IO_REQUEST.getSymbol()) - .isGreaterThanOrEqualTo(1); - } - } - -} diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java index 669acd8b8bd56..d4981c3933b18 100644 --- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java +++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java @@ -18,9 +18,13 @@ package org.apache.hadoop.fs.s3a.test; +import org.junit.Assume; + import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.s3a.S3ATestConstants; import org.apache.hadoop.fs.s3a.S3ATestUtils; import static org.apache.hadoop.fs.s3a.S3ATestConstants.KEY_BUCKET_WITH_MANY_OBJECTS; @@ -62,6 +66,77 @@ private PublicDatasetTestUtils() {} private static final String DEFAULT_BUCKET_WITH_MANY_OBJECTS = "s3a://usgs-landsat/collection02/level-1/"; + /** + * ORC dataset: {@value}. + */ + private static final Path ORC_DATA = new Path("s3a://osm-pds/planet/planet-latest.orc"); + + /** + * Provide a Path for some ORC data. + * + * @param conf Hadoop configuration + * @return S3A FS URI + */ + public static Path getOrcData(Configuration conf) { + return ORC_DATA; + } + + /** + * Default path for the external test file: {@value}. + * This must be: gzipped, large enough for the performance + * tests and in a read-only bucket with anonymous access. + * */ + public static final String DEFAULT_EXTERNAL_FILE = + "s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz"; + + /** + * Get the external test file. + *

+ * This must be: gzipped, large enough for the performance + * tests and in a read-only bucket with anon + * @param conf configuration + * @return a dataset which meets the requirements. + */ + public static Path getExternalData(Configuration conf) { + return new Path(fetchFromConfig(conf, + S3ATestConstants.KEY_CSVTEST_FILE, DEFAULT_EXTERNAL_FILE)); + } + + /** + * Get the anonymous dataset.. + * @param conf configuration + * @return a dataset which supports anonymous access. + */ + public static Path requireAnonymousDataPath(Configuration conf) { + return requireDefaultExternalData(conf); + } + + + /** + * Get the external test file; assume() that it is not modified (i.e. we haven't + * switched to a new storage infrastructure where the bucket is no longer + * read only). + * @return test file. + * @param conf test configuration + */ + public static String requireDefaultExternalDataFile(Configuration conf) { + String filename = getExternalData(conf).toUri().toString(); + Assume.assumeTrue("External test file is not the default", + DEFAULT_EXTERNAL_FILE.equals(filename)); + return filename; + } + + /** + * Get the test external file; assume() that it is not modified (i.e. we haven't + * switched to a new storage infrastructure where the bucket is no longer + * read only). + * @param conf test configuration + * @return test file as a path. + */ + public static Path requireDefaultExternalData(Configuration conf) { + return new Path(requireDefaultExternalDataFile(conf)); + } + /** * Provide a URI for a directory containing many objects. * @@ -90,6 +165,13 @@ public static String getRequesterPaysObject(Configuration conf) { KEY_REQUESTER_PAYS_FILE, DEFAULT_REQUESTER_PAYS_FILE); } + /** + * Fetch a trimmed configuration value, require it to to be non-empty. + * @param conf configuration file + * @param key key + * @param defaultValue default value. + * @return the resolved value. + */ private static String fetchFromConfig(Configuration conf, String key, String defaultValue) { String value = conf.getTrimmed(key, defaultValue); diff --git a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml index b17d1555ac7a1..0cba67bd68676 100644 --- a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml +++ b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml @@ -30,37 +30,57 @@ false - + + + + fs.s3a.bucket.noaa-cors-pds.endpoint.region + us-east-1 - fs.s3a.bucket.landsat-pds.multipart.purge + fs.s3a.bucket.noaa-isd-pds.multipart.purge false Don't try to purge uploads in the read-only bucket, as it will only create log noise. - fs.s3a.bucket.landsat-pds.probe + fs.s3a.bucket.noaa-isd-pds.probe 0 Let's postpone existence checks to the first IO operation - fs.s3a.bucket.landsat-pds.audit.add.referrer.header + fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header false - Do not add the referrer header to landsat operations + Do not add the referrer header + + + + fs.s3a.bucket.noaa-isd-pds.prefetch.block.size + 128k + Use a small prefetch size so tests fetch multiple blocks