diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md
index 7a95907217789..8d00714b85075 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/auditing.md
@@ -111,9 +111,9 @@ Specific buckets can have auditing disabled, even when it is enabled globally.
```xml
- fs.s3a.bucket.landsat-pds.audit.enabled
+ fs.s3a.bucket.noaa-isd-pds.audit.enabled
false
- Do not audit landsat bucket operations
+ Do not audit bucket operations
```
@@ -318,9 +318,9 @@ either globally or for specific buckets:
- fs.s3a.bucket.landsat-pds.audit.referrer.enabled
+ fs.s3a.bucket.noaa-isd-pds.audit.referrer.enabled
false
- Do not add the referrer header to landsat operations
+ Do not add the referrer header to operations
```
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
index 4c14921c4b4aa..fb42d507b2d60 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
@@ -747,7 +747,7 @@ For example, for any job executed through Hadoop MapReduce, the Job ID can be us
### `Filesystem does not have support for 'magic' committer`
```
-org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://landsat-pds': Filesystem does not have support for 'magic' committer enabled
+org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://noaa-isd-pds': Filesystem does not have support for 'magic' committer enabled
in configuration option fs.s3a.committer.magic.enabled
```
@@ -760,42 +760,15 @@ Remove all global/per-bucket declarations of `fs.s3a.bucket.magic.enabled` or se
```xml
- fs.s3a.bucket.landsat-pds.committer.magic.enabled
+ fs.s3a.bucket.noaa-isd-pds.committer.magic.enabled
true
```
Tip: you can verify that a bucket supports the magic committer through the
-`hadoop s3guard bucket-info` command:
+`hadoop s3guard bucket-info` command.
-```
-> hadoop s3guard bucket-info -magic s3a://landsat-pds/
-Location: us-west-2
-
-S3A Client
- Signing Algorithm: fs.s3a.signing-algorithm=(unset)
- Endpoint: fs.s3a.endpoint=s3.amazonaws.com
- Encryption: fs.s3a.encryption.algorithm=none
- Input seek policy: fs.s3a.experimental.input.fadvise=normal
- Change Detection Source: fs.s3a.change.detection.source=etag
- Change Detection Mode: fs.s3a.change.detection.mode=server
-
-S3A Committers
- The "magic" committer is supported in the filesystem
- S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
- S3A Committer name: fs.s3a.committer.name=magic
- Store magic committer integration: fs.s3a.committer.magic.enabled=true
-
-Security
- Delegation token support is disabled
-
-Directory Markers
- The directory marker policy is "keep"
- Available Policies: delete, keep, authoritative
- Authoritative paths: fs.s3a.authoritative.path=```
-```
-
### Error message: "File being created has a magic path, but the filesystem has magic file support disabled"
A file is being written to a path which is used for "magic" files,
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md
index a31b1c3e39a05..f1839a0b20369 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md
@@ -248,14 +248,13 @@ a bucket.
The up to date list of regions is [Available online](https://docs.aws.amazon.com/general/latest/gr/s3.html).
This list can be used to specify the endpoint of individual buckets, for example
-for buckets in the central and EU/Ireland endpoints.
+for buckets in the us-west-2 and EU/Ireland endpoints.
```xml
- fs.s3a.bucket.landsat-pds.endpoint.region
+ fs.s3a.bucket.us-west-2-dataset.endpoint.region
us-west-2
- The region for s3a://landsat-pds URLs
@@ -318,9 +317,9 @@ The boolean option `fs.s3a.endpoint.fips` (default `false`) switches the S3A con
For a single bucket:
```xml
- fs.s3a.bucket.landsat-pds.endpoint.fips
+ fs.s3a.bucket.noaa-isd-pds.endpoint.fips
true
- Use the FIPS endpoint for the landsat dataset
+ Use the FIPS endpoint for the NOAA dataset
```
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md
index 0ba516313f42d..caa93c46c5ee1 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_token_architecture.md
@@ -188,7 +188,7 @@ If it was deployed unbonded, the DT Binding is asked to create a new DT.
It is up to the binding what it includes in the token identifier, and how it obtains them.
This new token identifier is included in a token which has a "canonical service name" of
-the URI of the filesystem (e.g "s3a://landsat-pds").
+the URI of the filesystem (e.g "s3a://noaa-isd-pds").
The issued/reissued token identifier can be marshalled and reused.
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md
index 7aaa1b8b5ce79..cdba4e3d2c9bd 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md
@@ -481,8 +481,8 @@ This will fetch the token and save it to the named file (here, `tokens.bin`),
even if Kerberos is disabled.
```bash
-# Fetch a token for the AWS landsat-pds bucket and save it to tokens.bin
-$ hdfs fetchdt --webservice s3a://landsat-pds/ tokens.bin
+# Fetch a token for the AWS noaa-isd-pds bucket and save it to tokens.bin
+$ hdfs fetchdt --webservice s3a://noaa-isd-pds/ tokens.bin
```
If the command fails with `ERROR: Failed to fetch token` it means the
@@ -498,11 +498,11 @@ host on which it was created.
```bash
$ bin/hdfs fetchdt --print tokens.bin
-Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://landsat-pds;
+Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://noaa-isd-pds;
timestamp=1541683947569; encryption=EncryptionSecrets{encryptionMethod=SSE_S3};
Created on vm1.local/192.168.99.1 at time 2018-11-08T13:32:26.381Z.};
Session credentials for user AAABWL expires Thu Nov 08 14:02:27 GMT 2018; (valid))
-for s3a://landsat-pds
+for s3a://noaa-isd-pds
```
The "(valid)" annotation means that the AWS credentials are considered "valid":
there is both a username and a secret.
@@ -513,11 +513,11 @@ If delegation support is enabled, it also prints the current
hadoop security level.
```bash
-$ hadoop s3guard bucket-info s3a://landsat-pds/
+$ hadoop s3guard bucket-info s3a://noaa-isd-pds/
-Filesystem s3a://landsat-pds
+Filesystem s3a://noaa-isd-pds
Location: us-west-2
-Filesystem s3a://landsat-pds is not using S3Guard
+Filesystem s3a://noaa-isd-pds is not using S3Guard
The "magic" committer is not supported
S3A Client
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md
index a375b0bdb96ea..36e96317a162c 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/directory_markers.md
@@ -313,9 +313,8 @@ All releases of Hadoop which have been updated to be marker aware will support t
Example: `s3guard bucket-info -markers aware` on a compatible release.
```
-> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
-Filesystem s3a://landsat-pds
-Location: us-west-2
+> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/
+Filesystem s3a://noaa-isd-pds
...
@@ -325,13 +324,14 @@ Directory Markers
Authoritative paths: fs.s3a.authoritative.path=
The S3A connector is compatible with buckets where directory markers are not deleted
+...
```
The same command will fail on older releases, because the `-markers` option
is unknown
```
-> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
+> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/
Illegal option -markers
Usage: hadoop bucket-info [OPTIONS] s3a://BUCKET
provide/check information about a specific bucket
@@ -353,9 +353,8 @@ Generic options supported are:
A specific policy check verifies that the connector is configured as desired
```
-> hadoop s3guard bucket-info -markers keep s3a://landsat-pds/
-Filesystem s3a://landsat-pds
-Location: us-west-2
+> hadoop s3guard bucket-info -markers keep s3a://noaa-isd-pds/
+Filesystem s3a://noaa-isd-pds
...
@@ -370,9 +369,8 @@ When probing for a specific policy, the error code "46" is returned if the activ
does not match that requested:
```
-> hadoop s3guard bucket-info -markers delete s3a://landsat-pds/
-Filesystem s3a://landsat-pds
-Location: us-west-2
+> hadoop s3guard bucket-info -markers delete s3a://noaa-isd-pds/
+Filesystem s3a://noaa-isd-pds
S3A Client
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
@@ -397,7 +395,7 @@ Directory Markers
Authoritative paths: fs.s3a.authoritative.path=
2021-11-22 16:03:59,175 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210))
- -Exiting with status 46: 46: Bucket s3a://landsat-pds: required marker polic is
+ -Exiting with status 46: 46: Bucket s3a://noaa-isd-pds: required marker polic is
"keep" but actual policy is "delete"
```
@@ -449,10 +447,10 @@ Audit the path and fail if any markers were found.
```
-> hadoop s3guard markers -limit 8000 -audit s3a://landsat-pds/
+> hadoop s3guard markers -limit 8000 -audit s3a://noaa-isd-pds/
-The directory marker policy of s3a://landsat-pds is "Keep"
-2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:(77)) - Starting: marker scan s3a://landsat-pds/
+The directory marker policy of s3a://noaa-isd-pds is "Keep"
+2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:(77)) - Starting: marker scan s3a://noaa-isd-pds/
Scanned 1,000 objects
Scanned 2,000 objects
Scanned 3,000 objects
@@ -462,8 +460,8 @@ Scanned 6,000 objects
Scanned 7,000 objects
Scanned 8,000 objects
Limit of scan reached - 8,000 objects
-2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://landsat-pds/: duration 0:05.107s
-No surplus directory markers were found under s3a://landsat-pds/
+2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://noaa-isd-pds/: duration 0:05.107s
+No surplus directory markers were found under s3a://noaa-isd-pds/
Listing limit reached before completing the scan
2020-08-05 13:43:01,187 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 3:
```
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
index 9049440313dd4..a65fc1ecbcedf 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/encryption.md
@@ -536,15 +536,14 @@ header.x-amz-version-id="KcDOVmznIagWx3gP1HlDqcZvm1mFWZ2a"
A file with no-encryption (on a bucket without versioning but with intelligent tiering):
```
-bin/hadoop fs -getfattr -d s3a://landsat-pds/scene_list.gz
+ bin/hadoop fs -getfattr -d s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
-# file: s3a://landsat-pds/scene_list.gz
-header.Content-Length="45603307"
-header.Content-Type="application/octet-stream"
-header.ETag="39c34d489777a595b36d0af5726007db"
-header.Last-Modified="Wed Aug 29 01:45:15 BST 2018"
-header.x-amz-storage-class="INTELLIGENT_TIERING"
-header.x-amz-version-id="null"
+# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
+header.Content-Length="524671"
+header.Content-Type="binary/octet-stream"
+header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
+header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
+header.x-amz-server-side-encryption="AES256"
```
### Use `rename()` to encrypt files with new keys
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
index 0c787de46768f..868ee6ab37e5e 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
@@ -492,7 +492,7 @@ explicitly opened up for broader access.
```bash
hadoop fs -ls \
-D fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider \
- s3a://landsat-pds/
+ s3a://noaa-isd-pds/
```
1. Allowing anonymous access to an S3 bucket compromises
@@ -1446,11 +1446,11 @@ a session key:
```
-Finally, the public `s3a://landsat-pds/` bucket can be accessed anonymously:
+Finally, the public `s3a://noaa-isd-pds/` bucket can be accessed anonymously:
```xml
- fs.s3a.bucket.landsat-pds.aws.credentials.provider
+ fs.s3a.bucket.noaa-isd-pds.aws.credentials.provider
org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider
```
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
index 45244d9c7814e..28b02470bac1c 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
@@ -405,7 +405,8 @@ An example of this is covered in [HADOOP-13871](https://issues.apache.org/jira/b
1. For public data, use `curl`:
- curl -O https://landsat-pds.s3.amazonaws.com/scene_list.gz
+ curl -O https://noaa-cors-pds.s3.amazonaws.com/raw/2023/001/akse/AKSE001a.23_.gz
+
1. Use `nettop` to monitor a processes connections.
@@ -654,7 +655,7 @@ via `FileSystem.get()` or `Path.getFileSystem()`.
The cache, `FileSystem.CACHE` will, for each user, cachec one instance of a filesystem
for a given URI.
All calls to `FileSystem.get` for a cached FS for a URI such
-as `s3a://landsat-pds/` will return that singe single instance.
+as `s3a://noaa-isd-pds/` will return that singe single instance.
FileSystem instances are created on-demand for the cache,
and will be done in each thread which requests an instance.
@@ -678,7 +679,7 @@ can be created simultaneously for different object stores/distributed
filesystems.
For example, a value of four would put an upper limit on the number
-of wasted instantiations of a connector for the `s3a://landsat-pds/`
+of wasted instantiations of a connector for the `s3a://noaa-isd-pds/`
bucket.
```xml
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
index 53a11404cded3..8840445d2560c 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
@@ -260,22 +260,20 @@ define the target region in `auth-keys.xml`.
### CSV Data Tests
The `TestS3AInputStreamPerformance` tests require read access to a multi-MB
-text file. The default file for these tests is one published by amazon,
-[s3a://landsat-pds.s3.amazonaws.com/scene_list.gz](http://landsat-pds.s3.amazonaws.com/scene_list.gz).
-This is a gzipped CSV index of other files which amazon serves for open use.
+text file. The default file for these tests is a public one.
+`s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz`
+from the [NOAA Continuously Operating Reference Stations (CORS) Network (NCN)](https://registry.opendata.aws/noaa-ncn/)
Historically it was required to be a `csv.gz` file to validate S3 Select
support. Now that S3 Select support has been removed, other large files
may be used instead.
-However, future versions may want to read a CSV file again, so testers
-should still reference one.
The path to this object is set in the option `fs.s3a.scale.test.csvfile`,
```xml
fs.s3a.scale.test.csvfile
- s3a://landsat-pds/scene_list.gz
+ s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz
```
@@ -285,6 +283,7 @@ is hosted in Amazon's US-east datacenter.
1. If the data cannot be read for any reason then the test will fail.
1. If the property is set to a different path, then that data must be readable
and "sufficiently" large.
+1. If a `.gz` file, expect decompression-related test failures.
(the reason the space or newline is needed is to add "an empty entry"; an empty
`` would be considered undefined and pick up the default)
@@ -292,14 +291,13 @@ and "sufficiently" large.
If using a test file in a different AWS S3 region then
a bucket-specific region must be defined.
-For the default test dataset, hosted in the `landsat-pds` bucket, this is:
+For the default test dataset, hosted in the `noaa-cors-pds` bucket, this is:
```xml
-
- fs.s3a.bucket.landsat-pds.endpoint.region
- us-west-2
- The region for s3a://landsat-pds
-
+
+ fs.s3a.bucket.noaa-cors-pds.endpoint.region
+ us-east-1
+
```
### Testing Access Point Integration
@@ -825,7 +823,7 @@ the tests become skipped, rather than fail with a trace which is really a false
The ordered test case mechanism of `AbstractSTestS3AHugeFiles` is probably
the most elegant way of chaining test setup/teardown.
-Regarding reusing existing data, we tend to use the landsat archive of
+Regarding reusing existing data, we tend to use the noaa-cors-pds archive of
AWS US-East for our testing of input stream operations. This doesn't work
against other regions, or with third party S3 implementations. Thus the
URL can be overridden for testing elsewhere.
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java
index c13c3f48b8466..9a880db25eedc 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java
@@ -39,10 +39,10 @@
import org.slf4j.LoggerFactory;
import static org.apache.hadoop.fs.s3a.Constants.*;
-import static org.apache.hadoop.fs.s3a.S3ATestUtils.getCSVTestPath;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides;
import static org.apache.hadoop.fs.s3a.S3AUtils.*;
import static org.apache.hadoop.fs.s3a.auth.delegation.DelegationConstants.DELEGATION_TOKEN_BINDING;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
import static org.junit.Assert.*;
/**
@@ -162,7 +162,7 @@ public void testAnonymousProvider() throws Exception {
Configuration conf = new Configuration();
conf.set(AWS_CREDENTIALS_PROVIDER,
AnonymousAWSCredentialsProvider.class.getName());
- Path testFile = getCSVTestPath(conf);
+ Path testFile = getExternalData(conf);
try (FileSystem fs = FileSystem.newInstance(testFile.toUri(), conf)) {
assertNotNull("S3AFileSystem instance must not be null", fs);
assertTrue("FileSystem must be the instance of S3AFileSystem", fs instanceof S3AFileSystem);
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
index c0f6a4b23226b..9e40534c82bd5 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFailureHandling.java
@@ -21,7 +21,6 @@
import com.amazonaws.services.s3.model.DeleteObjectsRequest;
import org.assertj.core.api.Assertions;
-import org.junit.Assume;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.LocatedFileStatus;
@@ -42,6 +41,7 @@
import static org.apache.hadoop.fs.contract.ContractTestUtils.*;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.createFiles;
import static org.apache.hadoop.fs.s3a.test.ExtraAssertions.failIf;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireDefaultExternalData;
import static org.apache.hadoop.test.LambdaTestUtils.*;
import static org.apache.hadoop.util.functional.RemoteIterators.mappingRemoteIterator;
import static org.apache.hadoop.util.functional.RemoteIterators.toList;
@@ -135,22 +135,13 @@ public void testMultiObjectDeleteSomeFiles() throws Throwable {
timer.end("removeKeys");
}
-
- private Path maybeGetCsvPath() {
- Configuration conf = getConfiguration();
- String csvFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
- Assume.assumeTrue("CSV test file is not the default",
- DEFAULT_CSVTEST_FILE.equals(csvFile));
- return new Path(csvFile);
- }
-
/**
* Test low-level failure handling with low level delete request.
*/
@Test
public void testMultiObjectDeleteNoPermissions() throws Throwable {
- describe("Delete the landsat CSV file and expect it to fail");
- Path csvPath = maybeGetCsvPath();
+ describe("Delete the external file and expect it to fail");
+ Path csvPath = requireDefaultExternalData(getConfiguration());
S3AFileSystem fs = (S3AFileSystem) csvPath.getFileSystem(
getConfiguration());
// create a span, expect it to be activated.
@@ -170,8 +161,8 @@ public void testMultiObjectDeleteNoPermissions() throws Throwable {
*/
@Test
public void testSingleObjectDeleteNoPermissionsTranslated() throws Throwable {
- describe("Delete the landsat CSV file and expect it to fail");
- Path csvPath = maybeGetCsvPath();
+ describe("Delete the external file and expect it to fail");
+ Path csvPath = requireDefaultExternalData(getConfiguration());
S3AFileSystem fs = (S3AFileSystem) csvPath.getFileSystem(
getConfiguration());
AccessDeniedException aex = intercept(AccessDeniedException.class,
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
index 57f7686b62082..274eab44b71f1 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingCacheFiles.java
@@ -19,8 +19,9 @@
package org.apache.hadoop.fs.s3a;
import java.io.File;
-import java.net.URI;
+import java.util.UUID;
+import org.assertj.core.api.Assertions;
import org.junit.Before;
import org.junit.Test;
import org.slf4j.Logger;
@@ -30,15 +31,16 @@
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.contract.ContractTestUtils;
import org.apache.hadoop.fs.permission.FsAction;
import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
import static org.apache.hadoop.fs.s3a.Constants.BUFFER_DIR;
-import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
/**
@@ -49,11 +51,21 @@ public class ITestS3APrefetchingCacheFiles extends AbstractS3ACostTest {
private static final Logger LOG =
LoggerFactory.getLogger(ITestS3APrefetchingCacheFiles.class);
+ /** use a small file size so small source files will still work. */
+ public static final int BLOCK_SIZE = 128 * 1024;
+
+ public static final int PREFETCH_OFFSET = 10240;
+
private Path testFile;
+
+ /** The FS with the external file. */
private FileSystem fs;
+
private int prefetchBlockSize;
private Configuration conf;
+ private String bufferDir;
+
public ITestS3APrefetchingCacheFiles() {
super(true);
}
@@ -63,35 +75,31 @@ public void setUp() throws Exception {
super.setup();
// Sets BUFFER_DIR by calling S3ATestUtils#prepareTestConfiguration
conf = createConfiguration();
- String testFileUri = S3ATestUtils.getCSVTestFile(conf);
- testFile = new Path(testFileUri);
- prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE);
- fs = getFileSystem();
- fs.initialize(new URI(testFileUri), conf);
+ testFile = getExternalData(conf);
+ prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, BLOCK_SIZE);
+ fs = FileSystem.get(testFile.toUri(), conf);
}
@Override
public Configuration createConfiguration() {
Configuration configuration = super.createConfiguration();
S3ATestUtils.removeBaseAndBucketOverrides(configuration, PREFETCH_ENABLED_KEY);
- S3ATestUtils.removeBaseAndBucketOverrides(configuration, PREFETCH_BLOCK_SIZE_KEY);
configuration.setBoolean(PREFETCH_ENABLED_KEY, true);
+ // use a small block size unless explicitly set in the test config.
+ configuration.setInt(PREFETCH_BLOCK_SIZE_KEY, BLOCK_SIZE);
+ // patch buffer dir with a unique path for test isolation.
+ final String bufferDirBase = configuration.get(BUFFER_DIR);
+ bufferDir = bufferDirBase + "/" + UUID.randomUUID();
+ configuration.set(BUFFER_DIR, bufferDir);
return configuration;
}
@Override
public synchronized void teardown() throws Exception {
super.teardown();
- File tmpFileDir = new File(conf.get(BUFFER_DIR));
- File[] tmpFiles = tmpFileDir.listFiles();
- if (tmpFiles != null) {
- for (File filePath : tmpFiles) {
- String path = filePath.getPath();
- if (path.endsWith(".bin") && path.contains("fs-cache-")) {
- filePath.delete();
- }
- }
+ if (bufferDir != null) {
+ new File(bufferDir).delete();
}
cleanupWithLogger(LOG, fs);
fs = null;
@@ -110,34 +118,35 @@ public void testCacheFileExistence() throws Throwable {
try (FSDataInputStream in = fs.open(testFile)) {
byte[] buffer = new byte[prefetchBlockSize];
- in.read(buffer, 0, prefetchBlockSize - 10240);
- in.seek(prefetchBlockSize * 2);
- in.read(buffer, 0, prefetchBlockSize);
+ // read a bit less than a block
+ in.readFully(0, buffer, 0, prefetchBlockSize - PREFETCH_OFFSET);
+ // read at least some of a second block
+ in.read(prefetchBlockSize * 2, buffer, 0, prefetchBlockSize);
+
File tmpFileDir = new File(conf.get(BUFFER_DIR));
- assertTrue("The dir to keep cache files must exist", tmpFileDir.exists());
+ final LocalFileSystem localFs = FileSystem.getLocal(conf);
+ Path bufferDirPath = new Path(tmpFileDir.toURI());
+ ContractTestUtils.assertIsDirectory(localFs, bufferDirPath);
File[] tmpFiles = tmpFileDir
.listFiles((dir, name) -> name.endsWith(".bin") && name.contains("fs-cache-"));
- boolean isCacheFileForBlockFound = tmpFiles != null && tmpFiles.length > 0;
- if (!isCacheFileForBlockFound) {
- LOG.warn("No cache files found under " + tmpFileDir);
- }
- assertTrue("File to cache block data must exist", isCacheFileForBlockFound);
+ Assertions.assertThat(tmpFiles)
+ .describedAs("Cache files not found under %s", tmpFileDir)
+ .isNotEmpty();
+
for (File tmpFile : tmpFiles) {
Path path = new Path(tmpFile.getAbsolutePath());
- try (FileSystem localFs = FileSystem.getLocal(conf)) {
- FileStatus stat = localFs.getFileStatus(path);
- ContractTestUtils.assertIsFile(path, stat);
- assertEquals("File length not matching with prefetchBlockSize", prefetchBlockSize,
- stat.getLen());
- assertEquals("User permissions should be RW", FsAction.READ_WRITE,
- stat.getPermission().getUserAction());
- assertEquals("Group permissions should be NONE", FsAction.NONE,
- stat.getPermission().getGroupAction());
- assertEquals("Other permissions should be NONE", FsAction.NONE,
- stat.getPermission().getOtherAction());
- }
+ FileStatus stat = localFs.getFileStatus(path);
+ ContractTestUtils.assertIsFile(path, stat);
+ assertEquals("File length not matching with prefetchBlockSize", prefetchBlockSize,
+ stat.getLen());
+ assertEquals("User permissions should be RW", FsAction.READ_WRITE,
+ stat.getPermission().getUserAction());
+ assertEquals("Group permissions should be NONE", FsAction.NONE,
+ stat.getPermission().getGroupAction());
+ assertEquals("Other permissions should be NONE", FsAction.NONE,
+ stat.getPermission().getOtherAction());
}
}
}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java
index a6269c437665a..50f58c248acf0 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java
@@ -96,14 +96,16 @@ public interface S3ATestConstants {
String KEY_CSVTEST_FILE = S3A_SCALE_TEST + "csvfile";
/**
- * The landsat bucket: {@value}.
+ * Default path for the multi MB test file: {@value}.
+ * @deprecated retrieve via {@link PublicDatasetTestUtils}.
*/
- String LANDSAT_BUCKET = "s3a://landsat-pds/";
+ @Deprecated
+ String DEFAULT_CSVTEST_FILE = PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE;
/**
- * Default path for the multi MB test file: {@value}.
+ * Example path for unit tests; this is never accessed: {@value}.
*/
- String DEFAULT_CSVTEST_FILE = LANDSAT_BUCKET + "scene_list.gz";
+ String UNIT_TEST_EXAMPLE_PATH = "s3a://example/data/";
/**
* Configuration key for an existing object in a requester pays bucket: {@value}.
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
index 469562f9b33b9..9d2a6829f9d66 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
@@ -88,6 +88,8 @@
import static org.apache.hadoop.fs.contract.ContractTestUtils.createFile;
import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.submit;
import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.waitForCompletion;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireDefaultExternalDataFile;
import static org.apache.hadoop.test.GenericTestUtils.buildPaths;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_CREDENTIAL_PROVIDER_PATH;
@@ -386,22 +388,22 @@ public static String getTestProperty(Configuration conf,
* Get the test CSV file; assume() that it is not empty.
* @param conf test configuration
* @return test file.
+ * @deprecated Retained only to assist cherrypicking patches
*/
+ @Deprecated
public static String getCSVTestFile(Configuration conf) {
- String csvFile = conf
- .getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
- Assume.assumeTrue("CSV test file is not the default",
- isNotEmpty(csvFile));
- return csvFile;
+ return getExternalData(conf).toUri().toString();
}
/**
* Get the test CSV path; assume() that it is not empty.
* @param conf test configuration
* @return test file as a path.
+ * @deprecated Retained only to assist cherrypicking patches
*/
+ @Deprecated
public static Path getCSVTestPath(Configuration conf) {
- return new Path(getCSVTestFile(conf));
+ return getExternalData(conf);
}
/**
@@ -410,12 +412,11 @@ public static Path getCSVTestPath(Configuration conf) {
* read only).
* @return test file.
* @param conf test configuration
+ * @deprecated Retained only to assist cherrypicking patches
*/
+ @Deprecated
public static String getLandsatCSVFile(Configuration conf) {
- String csvFile = getCSVTestFile(conf);
- Assume.assumeTrue("CSV test file is not the default",
- DEFAULT_CSVTEST_FILE.equals(csvFile));
- return csvFile;
+ return requireDefaultExternalDataFile(conf);
}
/**
* Get the test CSV file; assume() that it is not modified (i.e. we haven't
@@ -423,9 +424,11 @@ public static String getLandsatCSVFile(Configuration conf) {
* read only).
* @param conf test configuration
* @return test file as a path.
+ * @deprecated Retained only to assist cherrypicking patches
*/
+ @Deprecated
public static Path getLandsatCSVPath(Configuration conf) {
- return new Path(getLandsatCSVFile(conf));
+ return getExternalData(conf);
}
/**
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java
index 730bae0aeb101..9312b3a552144 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AAWSCredentialsProvider.java
@@ -46,26 +46,27 @@
import org.apache.hadoop.fs.s3a.auth.AbstractSessionCredentialsProvider;
import org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider;
import org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException;
+import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
import org.apache.hadoop.io.retry.RetryPolicy;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
import static org.apache.hadoop.fs.s3a.Constants.*;
-import static org.apache.hadoop.fs.s3a.S3ATestConstants.*;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.*;
import static org.apache.hadoop.fs.s3a.S3AUtils.*;
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
import static org.apache.hadoop.test.LambdaTestUtils.interceptFuture;
-import static org.junit.Assert.*;
/**
* Unit tests for {@link Constants#AWS_CREDENTIALS_PROVIDER} logic.
*/
-public class TestS3AAWSCredentialsProvider {
+public class TestS3AAWSCredentialsProvider extends AbstractS3ATestBase {
/**
- * URI of the landsat images.
+ * URI of the test file: this must be anonymously accessible.
+ * As these are unit tests no actual connection to the store is made.
*/
private static final URI TESTFILE_URI = new Path(
- DEFAULT_CSVTEST_FILE).toUri();
+ PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE).toUri();
@Rule
public ExpectedException exception = ExpectedException.none();
@@ -110,7 +111,7 @@ public void testInstantiationChain() throws Throwable {
TemporaryAWSCredentialsProvider.NAME
+ ", \t" + SimpleAWSCredentialsProvider.NAME
+ " ,\n " + AnonymousAWSCredentialsProvider.NAME);
- Path testFile = getCSVTestPath(conf);
+ Path testFile = getExternalData(conf);
AWSCredentialProviderList list = createAWSCredentialProviderSet(
testFile.toUri(), conf);
@@ -522,7 +523,7 @@ protected AWSCredentials createCredentials(Configuration config) throws IOExcept
@Test
public void testConcurrentAuthentication() throws Throwable {
Configuration conf = createProviderConfiguration(SlowProvider.class.getName());
- Path testFile = getCSVTestPath(conf);
+ Path testFile = getExternalData(conf);
AWSCredentialProviderList list = createAWSCredentialProviderSet(testFile.toUri(), conf);
@@ -592,7 +593,7 @@ protected AWSCredentials createCredentials(Configuration config) throws IOExcept
@Test
public void testConcurrentAuthenticationError() throws Throwable {
Configuration conf = createProviderConfiguration(ErrorProvider.class.getName());
- Path testFile = getCSVTestPath(conf);
+ Path testFile = getExternalData(conf);
AWSCredentialProviderList list = createAWSCredentialProviderSet(testFile.toUri(), conf);
ErrorProvider provider = (ErrorProvider) list.getProviders().get(0);
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
index 9fb09b4cede52..20f595543255e 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestAssumeRole.java
@@ -44,7 +44,6 @@
import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
import org.apache.hadoop.fs.s3a.MultipartUtils;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
-import org.apache.hadoop.fs.s3a.S3ATestConstants;
import org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider;
import org.apache.hadoop.fs.s3a.commit.CommitConstants;
import org.apache.hadoop.fs.s3a.commit.files.PendingSet;
@@ -64,6 +63,7 @@
import static org.apache.hadoop.fs.s3a.auth.RoleTestUtils.forbidden;
import static org.apache.hadoop.fs.s3a.auth.RoleTestUtils.newAssumedRoleConfig;
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardToolTestHelper.exec;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath;
import static org.apache.hadoop.fs.statistics.IOStatisticsLogging.ioStatisticsSourceToString;
import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
import static org.apache.hadoop.test.GenericTestUtils.assertExceptionContains;
@@ -104,7 +104,7 @@ public class ITestAssumeRole extends AbstractS3ATestBase {
public void setup() throws Exception {
super.setup();
assumeRoleTests();
- uri = new URI(S3ATestConstants.DEFAULT_CSVTEST_FILE);
+ uri = requireAnonymousDataPath(getConfiguration()).toUri();
}
@Override
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
index d5d62f2cae92c..ba9746358c575 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestDelegatedMRJob.java
@@ -58,6 +58,8 @@
import static org.apache.hadoop.fs.s3a.auth.delegation.DelegationConstants.*;
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.assertSecurityEnabled;
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.closeUserFileSystems;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getOrcData;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath;
/**
* Submit a job with S3 delegation tokens.
@@ -106,10 +108,17 @@ public class ITestDelegatedMRJob extends AbstractDelegationIT {
private Path destPath;
- private static final Path EXTRA_JOB_RESOURCE_PATH
- = new Path("s3a://osm-pds/planet/planet-latest.orc");
+ /**
+ * Path of the extra job resource; set up in
+ * {@link #createConfiguration()}.
+ */
+ private Path extraJobResourcePath;
- public static final URI jobResource = EXTRA_JOB_RESOURCE_PATH.toUri();
+ /**
+ * URI of the extra job resource; set up in
+ * {@link #createConfiguration()}.
+ */
+ private URI jobResourceUri;
/**
* Test array for parameterized test runs.
@@ -161,7 +170,9 @@ protected YarnConfiguration createConfiguration() {
conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS,
10_000);
- String host = jobResource.getHost();
+ extraJobResourcePath = getOrcData(conf);
+ jobResourceUri = extraJobResourcePath.toUri();
+ String host = jobResourceUri.getHost();
// and fix to the main endpoint if the caller has moved
conf.set(
String.format("fs.s3a.bucket.%s.endpoint", host), "");
@@ -229,9 +240,9 @@ protected int getTestTimeoutMillis() {
@Test
public void testCommonCrawlLookup() throws Throwable {
- FileSystem resourceFS = EXTRA_JOB_RESOURCE_PATH.getFileSystem(
+ FileSystem resourceFS = extraJobResourcePath.getFileSystem(
getConfiguration());
- FileStatus status = resourceFS.getFileStatus(EXTRA_JOB_RESOURCE_PATH);
+ FileStatus status = resourceFS.getFileStatus(extraJobResourcePath);
LOG.info("Extra job resource is {}", status);
assertTrue("Not encrypted: " + status, status.isEncrypted());
}
@@ -241,9 +252,9 @@ public void testJobSubmissionCollectsTokens() throws Exception {
describe("Mock Job test");
JobConf conf = new JobConf(getConfiguration());
- // the input here is the landsat file; which lets
+ // the input here is the external file; which lets
// us differentiate source URI from dest URI
- Path input = new Path(DEFAULT_CSVTEST_FILE);
+ Path input = requireAnonymousDataPath(getConfiguration());
final FileSystem sourceFS = input.getFileSystem(conf);
@@ -272,7 +283,7 @@ public void testJobSubmissionCollectsTokens() throws Exception {
// This is to actually stress the terasort code for which
// the yarn ResourceLocalizationService was having problems with
// fetching resources from.
- URI partitionUri = new URI(EXTRA_JOB_RESOURCE_PATH.toString() +
+ URI partitionUri = new URI(extraJobResourcePath.toString() +
"#_partition.lst");
job.addCacheFile(partitionUri);
@@ -302,7 +313,7 @@ public void testJobSubmissionCollectsTokens() throws Exception {
// look up the destination token
lookupToken(submittedCredentials, fs.getUri(), tokenKind);
lookupToken(submittedCredentials,
- EXTRA_JOB_RESOURCE_PATH.getFileSystem(conf).getUri(), tokenKind);
+ extraJobResourcePath.getFileSystem(conf).getUri(), tokenKind);
}
}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java
index 511b813475954..08dba4b798214 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestRoleDelegationInFilesystem.java
@@ -53,8 +53,7 @@ public Text getTokenKind() {
/**
* This verifies that the granted credentials only access the target bucket
- * by using the credentials in a new S3 client to query the AWS-owned landsat
- * bucket.
+ * by using the credentials in a new S3 client to query the public data bucket.
* @param delegatedFS delegated FS with role-restricted access.
* @throws Exception failure
*/
@@ -62,7 +61,7 @@ public Text getTokenKind() {
protected void verifyRestrictedPermissions(final S3AFileSystem delegatedFS)
throws Exception {
intercept(AccessDeniedException.class,
- () -> readLandsatMetadata(delegatedFS));
+ () -> readExternalDatasetMetadata(delegatedFS));
}
}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
index 295125169a00c..35300980959ee 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/ITestSessionDelegationInFilesystem.java
@@ -76,6 +76,7 @@
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.ALICE;
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.assertSecurityEnabled;
import static org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.lookupS3ADelegationToken;
+import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath;
import static org.apache.hadoop.test.LambdaTestUtils.doAs;
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
import static org.hamcrest.Matchers.containsString;
@@ -330,8 +331,10 @@ public void testDelegatedFileSystem() throws Throwable {
+ " if role restricted, permissions are tightened.");
S3AFileSystem fs = getFileSystem();
// force a probe of the remote FS to make sure its endpoint is valid
- fs.getObjectMetadata(new Path("/"));
- readLandsatMetadata(fs);
+ // TODO: Check what should happen here. Calling headObject() on the root path fails in V2,
+ // with the error that key cannot be empty.
+ // fs.getObjectMetadata(new Path("/"));
+ readExternalDatasetMetadata(fs);
URI uri = fs.getUri();
// create delegation tokens from the test suites FS.
@@ -450,13 +453,13 @@ protected void executeDelegatedFSOperations(final S3AFileSystem delegatedFS,
}
/**
- * Session tokens can read the landsat bucket without problems.
+ * Session tokens can read the external bucket without problems.
* @param delegatedFS delegated FS
* @throws Exception failure
*/
protected void verifyRestrictedPermissions(final S3AFileSystem delegatedFS)
throws Exception {
- readLandsatMetadata(delegatedFS);
+ readExternalDatasetMetadata(delegatedFS);
}
@Test
@@ -569,7 +572,7 @@ public void testDelegationBindingMismatch2() throws Throwable {
/**
* This verifies that the granted credentials only access the target bucket
- * by using the credentials in a new S3 client to query the AWS-owned landsat
+ * by using the credentials in a new S3 client to query the external
* bucket.
* @param delegatedFS delegated FS with role-restricted access.
* @throws AccessDeniedException if the delegated FS's credentials can't
@@ -578,16 +581,17 @@ public void testDelegationBindingMismatch2() throws Throwable {
* @throws Exception failure
*/
@SuppressWarnings("deprecation")
- protected ObjectMetadata readLandsatMetadata(final S3AFileSystem delegatedFS)
+ protected ObjectMetadata readExternalDatasetMetadata(final S3AFileSystem delegatedFS)
throws Exception {
AWSCredentialProviderList testingCreds
= delegatedFS.shareCredentials("testing");
- URI landsat = new URI(DEFAULT_CSVTEST_FILE);
+ URI external = requireAnonymousDataPath(getConfiguration()).toUri();
DefaultS3ClientFactory factory
= new DefaultS3ClientFactory();
- factory.setConf(new Configuration(delegatedFS.getConf()));
- String host = landsat.getHost();
+ Configuration conf = delegatedFS.getConf();
+ factory.setConf(conf);
+ String host = external.getHost();
S3ClientFactory.S3ClientCreationParameters parameters = null;
parameters = new S3ClientFactory.S3ClientCreationParameters()
.withCredentialSet(testingCreds)
@@ -596,10 +600,10 @@ protected ObjectMetadata readLandsatMetadata(final S3AFileSystem delegatedFS)
.withMetrics(new EmptyS3AStatisticsContext()
.newStatisticsFromAwsSdk())
.withUserAgentSuffix("ITestSessionDelegationInFilesystem");
- AmazonS3 s3 = factory.createS3Client(landsat, parameters);
+ AmazonS3 s3 = factory.createS3Client(external, parameters);
return Invoker.once("HEAD", host,
- () -> s3.getObjectMetadata(host, landsat.getPath().substring(1)));
+ () -> s3.getObjectMetadata(host, external.getPath().substring(1)));
}
/**
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java
index 88d9ebfcdfdc3..ea4e2e54ff6ee 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/delegation/TestS3ADelegationTokenSupport.java
@@ -24,10 +24,10 @@
import org.junit.Test;
import org.apache.hadoop.fs.s3a.S3AEncryptionMethods;
-import org.apache.hadoop.fs.s3a.S3ATestConstants;
import org.apache.hadoop.fs.s3a.S3ATestUtils;
import org.apache.hadoop.fs.s3a.auth.MarshalledCredentialBinding;
import org.apache.hadoop.fs.s3a.auth.MarshalledCredentials;
+import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.security.token.SecretManager;
@@ -45,11 +45,11 @@
*/
public class TestS3ADelegationTokenSupport {
- private static URI landsatUri;
+ private static URI externalUri;
@BeforeClass
public static void classSetup() throws Exception {
- landsatUri = new URI(S3ATestConstants.DEFAULT_CSVTEST_FILE);
+ externalUri = new URI(PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE);
}
@Test
@@ -75,7 +75,7 @@ public void testSessionTokenDecode() throws Throwable {
= new SessionTokenIdentifier(SESSION_TOKEN_KIND,
alice,
renewer,
- new URI("s3a://landsat-pds/"),
+ new URI("s3a://anything/"),
new MarshalledCredentials("a", "b", ""),
new EncryptionSecrets(S3AEncryptionMethods.SSE_S3, ""),
"origin");
@@ -117,7 +117,7 @@ public void testSessionTokenIdentifierRoundTrip() throws Throwable {
SESSION_TOKEN_KIND,
new Text(),
renewer,
- landsatUri,
+ externalUri,
new MarshalledCredentials("a", "b", "c"),
new EncryptionSecrets(), "");
@@ -136,7 +136,7 @@ public void testSessionTokenIdentifierRoundTripNoRenewer() throws Throwable {
SESSION_TOKEN_KIND,
new Text(),
null,
- landsatUri,
+ externalUri,
new MarshalledCredentials("a", "b", "c"),
new EncryptionSecrets(), "");
@@ -152,7 +152,7 @@ public void testSessionTokenIdentifierRoundTripNoRenewer() throws Throwable {
@Test
public void testRoleTokenIdentifierRoundTrip() throws Throwable {
RoleTokenIdentifier id = new RoleTokenIdentifier(
- landsatUri,
+ externalUri,
new Text(),
new Text(),
new MarshalledCredentials("a", "b", "c"),
@@ -171,7 +171,7 @@ public void testRoleTokenIdentifierRoundTrip() throws Throwable {
public void testFullTokenIdentifierRoundTrip() throws Throwable {
Text renewer = new Text("renewerName");
FullCredentialsTokenIdentifier id = new FullCredentialsTokenIdentifier(
- landsatUri,
+ externalUri,
new Text(),
renewer,
new MarshalledCredentials("a", "b", ""),
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java
index ee6480a36af37..e2582cd0a2ee8 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/TestPaths.java
@@ -26,6 +26,7 @@
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.test.HadoopTestBase;
+import static org.apache.hadoop.fs.s3a.S3ATestConstants.UNIT_TEST_EXAMPLE_PATH;
import static org.apache.hadoop.fs.s3a.commit.staging.Paths.*;
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
@@ -81,7 +82,7 @@ private void assertUUIDAdded(String path, String expected) {
assertEquals("from " + path, expected, addUUID(path, "UUID"));
}
- private static final String DATA = "s3a://landsat-pds/data/";
+ private static final String DATA = UNIT_TEST_EXAMPLE_PATH;
private static final Path BASE = new Path(DATA);
@Test
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
index 23b14fd3792ca..c7bd892872f62 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardTool.java
@@ -22,14 +22,17 @@
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStreamReader;
+import java.net.URI;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.junit.Test;
+import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
+import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
import org.apache.hadoop.test.LambdaTestUtils;
import org.apache.hadoop.util.StringUtils;
@@ -37,7 +40,6 @@
import static org.apache.hadoop.fs.s3a.MultipartTestUtils.clearAnyUploads;
import static org.apache.hadoop.fs.s3a.MultipartTestUtils.countUploadsAt;
import static org.apache.hadoop.fs.s3a.MultipartTestUtils.createPartUpload;
-import static org.apache.hadoop.fs.s3a.S3ATestUtils.getLandsatCSVFile;
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.BucketInfo;
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.E_BAD_STATE;
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.Uploads;
@@ -53,35 +55,31 @@ public class ITestS3GuardTool extends AbstractS3GuardToolTestBase {
"-force", "-verbose"};
@Test
- public void testLandsatBucketUnguarded() throws Throwable {
+ public void testExternalBucketRequireUnencrypted() throws Throwable {
run(BucketInfo.NAME,
- "-" + BucketInfo.UNGUARDED_FLAG,
- getLandsatCSVFile(getConfiguration()));
- }
-
- @Test
- public void testLandsatBucketRequireGuarded() throws Throwable {
- runToFailure(E_BAD_STATE,
- BucketInfo.NAME,
- "-" + BucketInfo.GUARDED_FLAG,
- getLandsatCSVFile(
- ITestS3GuardTool.this.getConfiguration()));
+ "-" + BucketInfo.ENCRYPTION_FLAG, "none",
+ externalBucket());
}
- @Test
- public void testLandsatBucketRequireUnencrypted() throws Throwable {
- run(BucketInfo.NAME,
- "-" + BucketInfo.ENCRYPTION_FLAG, "none",
- getLandsatCSVFile(getConfiguration()));
+ /**
+ * Get the external bucket; this is of the default external file.
+ * If not set to the default value, the test will be skipped.
+ * @return the bucket of the default external file.
+ */
+ private String externalBucket() {
+ Configuration conf = getConfiguration();
+ Path result = PublicDatasetTestUtils.requireDefaultExternalData(conf);
+ final URI uri = result.toUri();
+ final String bucket = uri.getScheme() + "://" + uri.getHost();
+ return bucket;
}
@Test
- public void testLandsatBucketRequireEncrypted() throws Throwable {
+ public void testExternalBucketRequireEncrypted() throws Throwable {
runToFailure(E_BAD_STATE,
BucketInfo.NAME,
"-" + BucketInfo.ENCRYPTION_FLAG,
- "AES256", getLandsatCSVFile(
- ITestS3GuardTool.this.getConfiguration()));
+ "AES256", externalBucket());
}
@Test
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java
index c8e56f753bd50..95bb5a567f719 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestAuthoritativePath.java
@@ -33,6 +33,7 @@
import org.apache.hadoop.test.AbstractHadoopTestBase;
import static org.apache.hadoop.fs.s3a.Constants.AUTHORITATIVE_PATH;
+import static org.apache.hadoop.fs.s3a.S3ATestConstants.UNIT_TEST_EXAMPLE_PATH;
import static org.assertj.core.api.Assertions.assertThat;
/**
@@ -71,7 +72,7 @@ public void testResolutionWithFQP() throws Throwable {
@Test
public void testOtherBucket() throws Throwable {
assertAuthPaths(l("/one/",
- "s3a://landsat-pds/",
+ UNIT_TEST_EXAMPLE_PATH,
BASE + "/two/"),
"/one/", "/two/");
}
@@ -79,7 +80,7 @@ public void testOtherBucket() throws Throwable {
@Test
public void testOtherScheme() throws Throwable {
assertAuthPaths(l("/one/",
- "s3a://landsat-pds/",
+ UNIT_TEST_EXAMPLE_PATH,
"http://bucket/two/"),
"/one/");
}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
index fb9988b29a5c4..ae09452372316 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
@@ -30,6 +30,7 @@
import org.apache.hadoop.fs.s3a.S3AInputStream;
import org.apache.hadoop.fs.s3a.S3ATestUtils;
import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
+import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
import org.apache.hadoop.fs.statistics.IOStatistics;
import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
import org.apache.hadoop.fs.statistics.MeanStatistic;
@@ -112,7 +113,9 @@ public void openFS() throws IOException {
Configuration conf = getConf();
conf.setInt(SOCKET_SEND_BUFFER, 16 * 1024);
conf.setInt(SOCKET_RECV_BUFFER, 16 * 1024);
- String testFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
+ // look up the test file, no requirement to be set.
+ String testFile = conf.getTrimmed(KEY_CSVTEST_FILE,
+ PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE);
if (testFile.isEmpty()) {
assumptionMessage = "Empty test property: " + KEY_CSVTEST_FILE;
LOG.warn(assumptionMessage);
@@ -394,6 +397,9 @@ private void executeDecompression(long readahead,
CompressionCodecFactory factory
= new CompressionCodecFactory(getConf());
CompressionCodec codec = factory.getCodec(testData);
+ Assertions.assertThat(codec)
+ .describedAs("No codec found for %s", testData)
+ .isNotNull();
long bytesRead = 0;
int lines = 0;
@@ -525,12 +531,18 @@ private ContractTestUtils.NanoTimer executeRandomIO(S3AInputPolicy policy,
describe("Random IO with policy \"%s\"", policy);
byte[] buffer = new byte[_1MB];
long totalBytesRead = 0;
-
+ final long len = testDataStatus.getLen();
in = openTestFile(policy, 0);
ContractTestUtils.NanoTimer timer = new ContractTestUtils.NanoTimer();
for (int[] action : RANDOM_IO_SEQUENCE) {
- int position = action[0];
+ long position = action[0];
int range = action[1];
+ // if a read goes past EOF, fail with details
+ // this will happen if the test datafile is too small.
+ Assertions.assertThat(position + range)
+ .describedAs("readFully(pos=%d range=%d) of %s",
+ position, range, testDataStatus)
+ .isLessThanOrEqualTo(len);
in.readFully(position, buffer, 0, range);
totalBytesRead += range;
}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/statistics/ITestAWSStatisticCollection.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/statistics/ITestAWSStatisticCollection.java
deleted file mode 100644
index e7696996dbd1a..0000000000000
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/statistics/ITestAWSStatisticCollection.java
+++ /dev/null
@@ -1,82 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.hadoop.fs.s3a.statistics;
-
-import org.junit.Test;
-
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
-import org.apache.hadoop.fs.s3a.S3AFileSystem;
-import org.apache.hadoop.fs.statistics.IOStatistics;
-
-import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT;
-import static org.apache.hadoop.fs.s3a.Constants.ENDPOINT;
-import static org.apache.hadoop.fs.s3a.S3ATestUtils.getLandsatCSVPath;
-import static org.apache.hadoop.fs.s3a.Statistic.STORE_IO_REQUEST;
-import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.assertThatStatisticCounter;
-
-/**
- * Verify that AWS SDK statistics are wired up.
- * This test tries to read data from US-east-1 and us-west-2 buckets
- * so as to be confident that the nuances of region mapping
- * are handed correctly (HADOOP-13551).
- * The statistics are probed to verify that the wiring up is complete.
- */
-public class ITestAWSStatisticCollection extends AbstractS3ATestBase {
-
- private static final Path COMMON_CRAWL_PATH
- = new Path("s3a://osm-pds/planet/planet-latest.orc");
-
- @Test
- public void testLandsatStatistics() throws Throwable {
- final Configuration conf = getConfiguration();
- // skips the tests if the landsat path isn't the default.
- Path path = getLandsatCSVPath(conf);
- conf.set(ENDPOINT, DEFAULT_ENDPOINT);
- conf.unset("fs.s3a.bucket.landsat-pds.endpoint");
-
- try (S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(conf)) {
- fs.getObjectMetadata(path);
- IOStatistics iostats = fs.getIOStatistics();
- assertThatStatisticCounter(iostats,
- STORE_IO_REQUEST.getSymbol())
- .isGreaterThanOrEqualTo(1);
- }
- }
-
- @Test
- public void testCommonCrawlStatistics() throws Throwable {
- final Configuration conf = getConfiguration();
- // skips the tests if the landsat path isn't the default.
- getLandsatCSVPath(conf);
-
- Path path = COMMON_CRAWL_PATH;
- conf.set(ENDPOINT, DEFAULT_ENDPOINT);
-
- try (S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(conf)) {
- fs.getObjectMetadata(path);
- IOStatistics iostats = fs.getIOStatistics();
- assertThatStatisticCounter(iostats,
- STORE_IO_REQUEST.getSymbol())
- .isGreaterThanOrEqualTo(1);
- }
- }
-
-}
diff --git a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java
index 669acd8b8bd56..d4981c3933b18 100644
--- a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java
+++ b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/test/PublicDatasetTestUtils.java
@@ -18,9 +18,13 @@
package org.apache.hadoop.fs.s3a.test;
+import org.junit.Assume;
+
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.S3ATestConstants;
import org.apache.hadoop.fs.s3a.S3ATestUtils;
import static org.apache.hadoop.fs.s3a.S3ATestConstants.KEY_BUCKET_WITH_MANY_OBJECTS;
@@ -62,6 +66,77 @@ private PublicDatasetTestUtils() {}
private static final String DEFAULT_BUCKET_WITH_MANY_OBJECTS
= "s3a://usgs-landsat/collection02/level-1/";
+ /**
+ * ORC dataset: {@value}.
+ */
+ private static final Path ORC_DATA = new Path("s3a://osm-pds/planet/planet-latest.orc");
+
+ /**
+ * Provide a Path for some ORC data.
+ *
+ * @param conf Hadoop configuration
+ * @return S3A FS URI
+ */
+ public static Path getOrcData(Configuration conf) {
+ return ORC_DATA;
+ }
+
+ /**
+ * Default path for the external test file: {@value}.
+ * This must be: gzipped, large enough for the performance
+ * tests and in a read-only bucket with anonymous access.
+ * */
+ public static final String DEFAULT_EXTERNAL_FILE =
+ "s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz";
+
+ /**
+ * Get the external test file.
+ *
+ * This must be: gzipped, large enough for the performance
+ * tests and in a read-only bucket with anon
+ * @param conf configuration
+ * @return a dataset which meets the requirements.
+ */
+ public static Path getExternalData(Configuration conf) {
+ return new Path(fetchFromConfig(conf,
+ S3ATestConstants.KEY_CSVTEST_FILE, DEFAULT_EXTERNAL_FILE));
+ }
+
+ /**
+ * Get the anonymous dataset..
+ * @param conf configuration
+ * @return a dataset which supports anonymous access.
+ */
+ public static Path requireAnonymousDataPath(Configuration conf) {
+ return requireDefaultExternalData(conf);
+ }
+
+
+ /**
+ * Get the external test file; assume() that it is not modified (i.e. we haven't
+ * switched to a new storage infrastructure where the bucket is no longer
+ * read only).
+ * @return test file.
+ * @param conf test configuration
+ */
+ public static String requireDefaultExternalDataFile(Configuration conf) {
+ String filename = getExternalData(conf).toUri().toString();
+ Assume.assumeTrue("External test file is not the default",
+ DEFAULT_EXTERNAL_FILE.equals(filename));
+ return filename;
+ }
+
+ /**
+ * Get the test external file; assume() that it is not modified (i.e. we haven't
+ * switched to a new storage infrastructure where the bucket is no longer
+ * read only).
+ * @param conf test configuration
+ * @return test file as a path.
+ */
+ public static Path requireDefaultExternalData(Configuration conf) {
+ return new Path(requireDefaultExternalDataFile(conf));
+ }
+
/**
* Provide a URI for a directory containing many objects.
*
@@ -90,6 +165,13 @@ public static String getRequesterPaysObject(Configuration conf) {
KEY_REQUESTER_PAYS_FILE, DEFAULT_REQUESTER_PAYS_FILE);
}
+ /**
+ * Fetch a trimmed configuration value, require it to to be non-empty.
+ * @param conf configuration file
+ * @param key key
+ * @param defaultValue default value.
+ * @return the resolved value.
+ */
private static String fetchFromConfig(Configuration conf, String key, String defaultValue) {
String value = conf.getTrimmed(key, defaultValue);
diff --git a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
index b17d1555ac7a1..0cba67bd68676 100644
--- a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
+++ b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
@@ -30,37 +30,57 @@
false
-
+
+
+
+ fs.s3a.bucket.noaa-cors-pds.endpoint.region
+ us-east-1
- fs.s3a.bucket.landsat-pds.multipart.purge
+ fs.s3a.bucket.noaa-isd-pds.multipart.purge
false
Don't try to purge uploads in the read-only bucket, as
it will only create log noise.
- fs.s3a.bucket.landsat-pds.probe
+ fs.s3a.bucket.noaa-isd-pds.probe
0
Let's postpone existence checks to the first IO operation
- fs.s3a.bucket.landsat-pds.audit.add.referrer.header
+ fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header
false
- Do not add the referrer header to landsat operations
+ Do not add the referrer header
+
+
+
+ fs.s3a.bucket.noaa-isd-pds.prefetch.block.size
+ 128k
+ Use a small prefetch size so tests fetch multiple blocks