diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md
index 600e1e128a2c8..a31b1c3e39a05 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/connecting.md
@@ -74,7 +74,8 @@ There are three core settings to connect to an S3 store, endpoint, region and wh
fs.s3a.endpoint
AWS S3 endpoint to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
- property, the standard region (s3.amazonaws.com) is assumed.
+ property, the endpoint/hostname of the S3 Store is inferred from
+ the value of fs.s3a.endpoint.region, fs.s3a.endpoint.fips and more.
@@ -230,8 +231,9 @@ S3 endpoint, documented [by Amazon](http://docs.aws.amazon.com/general/latest/gr
use local buckets and local copies of data, wherever possible.
2. With the V4 signing protocol, AWS requires the explicit region endpoint
to be used —hence S3A must be configured to use the specific endpoint. This
-is done in the configuration option `fs.s3a.endpoint`.
-3. All endpoints other than the default endpoint only support interaction
+is done by setting the regon in the configuration option `fs.s3a.endpoint.region`,
+or by explicitly setting `fs.s3a.endpoint` and `fs.s3a.endpoint.region`.
+3. All endpoints other than the default region only support interaction
with buckets local to that S3 instance.
4. Standard S3 buckets support "cross-region" access where use of the original `us-east-1`
endpoint allows access to the data, but newer storage types, particularly S3 Express are
@@ -248,25 +250,12 @@ The up to date list of regions is [Available online](https://docs.aws.amazon.com
This list can be used to specify the endpoint of individual buckets, for example
for buckets in the central and EU/Ireland endpoints.
-```xml
-
- fs.s3a.bucket.landsat-pds.endpoint
- s3-us-west-2.amazonaws.com
-
-
-
- fs.s3a.bucket.eu-dataset.endpoint
- s3.eu-west-1.amazonaws.com
-
-```
-
-Declaring the region for the data is simpler, as it avoid having to look up the full URL and having to worry about historical quirks of regional endpoint hostnames.
```xml
fs.s3a.bucket.landsat-pds.endpoint.region
us-west-2
- The endpoint for s3a://landsat-pds URLs
+ The region for s3a://landsat-pds URLs
@@ -421,7 +410,6 @@ bucket by bucket basis i.e. `fs.s3a.bucket.{YOUR-BUCKET}.accesspoint.required`.
```
Before using Access Points make sure you're not impacted by the following:
-- `ListObjectsV1` is not supported, this is also deprecated on AWS S3 for performance reasons;
- The endpoint for S3 requests will automatically change to use
`s3-accesspoint.REGION.amazonaws.{com | com.cn}` depending on the Access Point ARN. While
considering endpoints, if you have any custom signers that use the host endpoint property make
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
index cb435535b7cd9..469541363e670 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
@@ -22,9 +22,6 @@ connection to S3 to interact with a bucket. Unit test suites follow the naming
convention `Test*.java`. Integration tests follow the naming convention
`ITest*.java`.
-Due to eventual consistency, integration tests may fail without reason.
-Transient failures, which no longer occur upon rerunning the test, should thus
-be ignored.
## Policy for submitting patches which affect the `hadoop-aws` module.
@@ -56,7 +53,6 @@ make for a slow iterative development.
Please: run the tests. And if you don't, we are sorry for declining your
patch, but we have to.
-
### What if there's an intermittent failure of a test?
Some of the tests do fail intermittently, especially in parallel runs.
@@ -147,7 +143,7 @@ Example:
```
-### Configuring S3a Encryption
+## Configuring S3a Encryption
For S3a encryption tests to run correctly, the
`fs.s3a.encryption.key` must be configured in the s3a contract xml
@@ -175,6 +171,21 @@ on the AWS side. Some S3AFileSystem tests are skipped when default encryption is
enabled due to unpredictability in how [ETags](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html)
are generated.
+### Disabling the encryption tests
+
+If the S3 store/storage class doesn't support server-side-encryption, these will fail. They
+can be turned off.
+
+```xml
+
+ test.fs.s3a.encryption.enabled
+ false
+
+```
+
+Encryption is only used for those specific test suites with `Encryption` in
+their classname.
+
## Running the Tests
After completing the configuration, execute the test run through Maven.
@@ -241,23 +252,11 @@ define the target region in `auth-keys.xml`.
```xml
- fs.s3a.endpoint
- s3.eu-central-1.amazonaws.com
-
-```
-
-Alternatively you can use endpoints defined in [core-site.xml](../../../../test/resources/core-site.xml).
-
-```xml
-
- fs.s3a.endpoint
- ${frankfurt.endpoint}
+ fs.s3a.endpoint.region
+ eu-central-1
```
-This is used for all tests expect for scale tests using a Public CSV.gz file
-(see below)
-
### CSV Data Tests
The `TestS3AInputStreamPerformance` tests require read access to a multi-MB
@@ -265,6 +264,12 @@ text file. The default file for these tests is one published by amazon,
[s3a://landsat-pds.s3.amazonaws.com/scene_list.gz](http://landsat-pds.s3.amazonaws.com/scene_list.gz).
This is a gzipped CSV index of other files which amazon serves for open use.
+Historically it was required to be a `csv.gz` file to validate S3 Select
+support. Now that S3 Select support has been removed, other large files
+may be used instead.
+However, future versions may want to read a CSV file again, so testers
+should still reference one.
+
The path to this object is set in the option `fs.s3a.scale.test.csvfile`,
```xml
@@ -284,19 +289,21 @@ and "sufficiently" large.
(the reason the space or newline is needed is to add "an empty entry"; an empty
`` would be considered undefined and pick up the default)
-Of using a test file in an S3 region requiring a different endpoint value
-set in `fs.s3a.endpoint`, a bucket-specific endpoint must be defined.
+
+If using a test file in a different AWS S3 region then
+a bucket-specific region must be defined.
For the default test dataset, hosted in the `landsat-pds` bucket, this is:
```xml
- fs.s3a.bucket.landsat-pds.endpoint
- s3.amazonaws.com
- The endpoint for s3a://landsat-pds URLs
+ fs.s3a.bucket.landsat-pds.endpoint.region
+ us-west-2
+ The region for s3a://landsat-pds
```
-### Testing Access Point Integration
+### Testing Access Point Integration
+
S3a supports using Access Point ARNs to access data in S3. If you think your changes affect VPC
integration, request signing, ARN manipulation, or any code path that deals with the actual
sending and retrieving of data to/from S3, make sure you run the entire integration test suite with
@@ -551,9 +558,9 @@ They do not run automatically: they must be explicitly run from the command line
Look in the source for these and reads the Javadocs before executing.
-## Testing against non AWS S3 endpoints.
+## Testing against non-AWS S3 Stores.
-The S3A filesystem is designed to work with storage endpoints which implement
+The S3A filesystem is designed to work with S3 stores which implement
the S3 protocols to the extent that the amazon S3 SDK is capable of talking
to it. We encourage testing against other filesystems and submissions of patches
which address issues. In particular, we encourage testing of Hadoop release
@@ -579,9 +586,11 @@ on third party stores.
test.fs.s3a.create.create.acl.enabled
false
- < /property>
+
```
+See [Third Party Stores](third_party_stores.html) for more on this topic.
+
### Public datasets used in tests
Some tests rely on the presence of existing public datasets available on Amazon S3.
@@ -595,20 +604,6 @@ store that supports these tests.
An example of this might be the MarkerTools tests which require a bucket with a large number of
objects or the requester pays tests that require requester pays to be enabled for the bucket.
-### Disabling the encryption tests
-
-If the endpoint doesn't support server-side-encryption, these will fail. They
-can be turned off.
-
-```xml
-
- test.fs.s3a.encryption.enabled
- false
-
-```
-
-Encryption is only used for those specific test suites with `Encryption` in
-their classname.
### Disabling the storage class tests
@@ -654,7 +649,7 @@ If `ITestS3AContractGetFileStatusV1List` fails with any error about unsupported
```
Note: there's no equivalent for turning off v2 listing API, which all stores are now
-expected to support.
+required to support.
### Testing Requester Pays
@@ -745,12 +740,8 @@ after setting this rerun the tests
log4j.logger.org.apache.hadoop.fs.s3a=DEBUG
```
-There are also some logging options for debug logging of the AWS client
-```properties
-log4j.logger.com.amazonaws=DEBUG
-log4j.logger.com.amazonaws.http.conn.ssl=INFO
-log4j.logger.com.amazonaws.internal=INFO
-```
+There are also some logging options for debug logging of the AWS client;
+consult the file.
There is also the option of enabling logging on a bucket; this could perhaps
be used to diagnose problems from that end. This isn't something actively
@@ -872,13 +863,13 @@ against other regions, or with third party S3 implementations. Thus the
URL can be overridden for testing elsewhere.
-### Works With Other S3 Endpoints
+### Works With Other S3 Stored
Don't assume AWS S3 US-East only, do allow for working with external S3 implementations.
Those may be behind the latest S3 API features, not support encryption, session
APIs, etc.
-They won't have the same CSV test files as some of the input tests rely on.
+They won't have the same CSV/large test files as some of the input tests rely on.
Look at `ITestS3AInputStreamPerformance` to see how tests can be written
to support the declaration of a specific large test file on alternate filesystems.
@@ -935,6 +926,8 @@ modifying the config. As an example from `AbstractTestS3AEncryption`:
protected Configuration createConfiguration() {
Configuration conf = super.createConfiguration();
S3ATestUtils.disableFilesystemCaching(conf);
+ removeBaseAndBucketOverrides(conf,
+ SERVER_SIDE_ENCRYPTION_ALGORITHM);
conf.set(Constants.SERVER_SIDE_ENCRYPTION_ALGORITHM,
getSSEAlgorithm().getMethod());
return conf;
@@ -991,9 +984,8 @@ than on the maven command line:
### Keeping AWS Costs down
-Most of the base S3 tests are designed to use public AWS data
-(the landsat-pds bucket) for read IO, so you don't have to pay for bytes
-downloaded or long term storage costs. The scale tests do work with more data
+Most of the base S3 tests are designed delete files after test runs,
+so you don't have to pay for storage costs. The scale tests do work with more data
so will cost more as well as generally take more time to execute.
You are however billed for
@@ -1102,7 +1094,7 @@ The usual credentials needed to log in to the bucket will be used, but now
the credentials used to interact with S3 will be temporary
role credentials, rather than the full credentials.
-## Qualifying an AWS SDK Update
+## Qualifying an AWS SDK Update
Updating the AWS SDK is something which does need to be done regularly,
but is rarely without complications, major or minor.
diff --git a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
index c99d7d43134cb..70b87ee275406 100644
--- a/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
+++ b/hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
@@ -31,6 +31,13 @@
+
fs.s3a.bucket.landsat-pds.endpoint.region
us-west-2
@@ -56,13 +63,13 @@
Do not add the referrer header to landsat operations
-
- fs.s3a.bucket.landsat-pds.endpoint.fips
- true
- Use the fips endpoint
-
-
+
+
fs.s3a.bucket.usgs-landsat.endpoint.region
us-west-2