Skip to content

Latest commit

 

History

History
145 lines (97 loc) · 8.11 KB

File metadata and controls

145 lines (97 loc) · 8.11 KB

Amazon S3 Source

This source app supports transfer of files using the Amazon S3 protocol. Files are transferred from the remote directory (S3 bucket) to the local directory where the application is deployed.

Messages emitted by the source are provided as a byte array by default. However, this can be customized using the --mode option:

  • ref Provides a java.io.File reference

  • lines Will split files line-by-line and emit a new message for each line

  • contents The default. Provides the contents of a file as a byte array

When using --mode=lines, you can also provide the additional option --withMarkers=true. If set to true, the underlying FileSplitter will emit additional start-of-file and end-of-file marker messages before and after the actual data. The payload of these 2 additional marker messages is of type FileSplitter.FileMarker. The option withMarkers defaults to false if not explicitly set.

See also MetadataStore options for possible shared persistent store configuration used to prevent duplicate messages on restart.

mode = lines

Headers:

  • Content-Type: text/plain

  • file_orginalFile: <java.io.File>

  • file_name: <file name>

  • correlationId: <UUID> (same for each line)

  • sequenceNumber: <n>

  • sequenceSize: 0 (number of lines is not know until the file is read)

Payload:

A String for each line.

The first line is optionally preceded by a message with a START marker payload. The last line is optionally followed by a message with an END marker payload.

Marker presence and format are determined by the with-markers and markers-json properties.

mode = ref

Headers:

None.

Payload:

A java.io.File object.

Options

The s3-source has the following options:

Properties grouped by prefix:

file.consumer

markers-json

When 'fileMarkers == true', specify if they should be produced as FileSplitter.FileMarker objects or JSON. (Boolean, default: true)

mode

The FileReadingMode to use for file reading sources. Values are 'ref' - The File object, 'lines' - a message per line, or 'contents' - the contents as bytes. (FileReadingMode, default: <none>, possible values: ref,lines,contents)

with-markers

Set to true to emit start of file/end of file marker messages before/after the data. Only valid with FileReadingMode 'lines'. (Boolean, default: <none>)

metadata.store.dynamo-db

create-delay

Delay between create table retries. (Integer, default: 1)

create-retries

Retry number for create table request. (Integer, default: 25)

read-capacity

Read capacity on the table. (Long, default: 1)

table

Table name for metadata. (String, default: <none>)

time-to-live

TTL for table entries. (Integer, default: <none>)

write-capacity

Write capacity on the table. (Long, default: 1)

metadata.store.jdbc

region

Unique grouping identifier for messages persisted with this store. (String, default: DEFAULT)

table-prefix

Prefix for the custom table name. (String, default: <none>)

metadata.store.mongo-db

collection

MongoDB collection name for metadata. (String, default: metadataStore)

metadata.store.redis

key

Redis key for metadata. (String, default: <none>)

metadata.store

type

Indicates the type of metadata store to configure (default is 'memory'). You must include the corresponding Spring Integration dependency to use a persistent store. (StoreType, default: <none>, possible values: mongodb,redis,dynamodb,jdbc,zookeeper,hazelcast,memory)

metadata.store.zookeeper

connect-string

Zookeeper connect string in form HOST:PORT. (String, default: 127.0.0.1:2181)

encoding

Encoding to use when storing data in Zookeeper. (Charset, default: UTF-8)

retry-interval

Retry interval for Zookeeper operations in milliseconds. (Integer, default: 1000)

root

Root node - store entries are children of this node. (String, default: /SpringIntegration-MetadataStore)

s3.supplier

auto-create-local-dir

Create or not the local directory. (Boolean, default: true)

delete-remote-files

Delete or not remote files after processing. (Boolean, default: false)

filename-pattern

The pattern to filter remote files. (String, default: <none>)

filename-regex

The regexp to filter remote files. (Pattern, default: <none>)

list-only

Set to true to return s3 object metadata without copying file to a local directory. (Boolean, default: false)

local-dir

The local directory to store files. (File, default: <none>)

preserve-timestamp

To transfer or not the timestamp of the remote file to the local one. (Boolean, default: true)

remote-dir

AWS S3 bucket resource. (String, default: bucket)

remote-file-separator

Remote File separator. (String, default: /)

tmp-file-suffix

Temporary file suffix. (String, default: .tmp)

spring.cloud.aws.credentials

access-key

The access key to be used with a static provider. (String, default: <none>)

instance-profile

Configures an instance profile credentials provider with no further configuration. (Boolean, default: false)

profile

The AWS profile. (Profile, default: <none>)

secret-key

The secret key to be used with a static provider. (String, default: <none>)

spring.cloud.aws.region

instance-profile

Configures an instance profile region provider with no further configuration. (Boolean, default: false)

profile

The AWS profile. (Profile, default: <none>)

static

<documentation missing> (String, default: <none>)

spring.cloud.aws.s3

accelerate-mode-enabled

Option to enable using the accelerate endpoint when accessing S3. Accelerate endpoints allow faster transfer of objects by using Amazon CloudFront's globally distributed edge locations. (Boolean, default: <none>)

checksum-validation-enabled

Option to disable doing a validation of the checksum of an object stored in S3. (Boolean, default: <none>)

chunked-encoding-enabled

Option to enable using chunked encoding when signing the request payload for {@link software.amazon.awssdk.services.s3.model.PutObjectRequest} and {@link software.amazon.awssdk.services.s3.model.UploadPartRequest}. (Boolean, default: <none>)

endpoint

Overrides the default endpoint. (URI, default: <none>)

path-style-access-enabled

Option to enable using path style access for accessing S3 objects instead of DNS style access. DNS style access is preferred as it will result in better load balancing when accessing S3. (Boolean, default: <none>)

region

Overrides the default region. (String, default: <none>)

use-arn-region-enabled

If an S3 resource ARN is passed in as the target of an S3 operation that has a different region to the one the client was configured with, this flag must be set to 'true' to permit the client to make a cross-region call to the region specified in the ARN otherwise an exception will be thrown. (Boolean, default: <none>)

Amazon AWS common options

The Amazon S3 Source (as all other Amazon AWS applications) is based on the Spring Cloud AWS project as a foundation, and its auto-configuration classes are used automatically by Spring Boot. Consult their documentation regarding required and useful auto-configuration properties.

Examples

java -jar s3-source.jar --s3.remoteDir=/tmp/foo --file.consumer.mode=lines