-
Notifications
You must be signed in to change notification settings - Fork 4
fix(gcp sink): Provide an option to override content encoding for GCP Cloud Storage Sink #30
fix(gcp sink): Provide an option to override content encoding for GCP Cloud Storage Sink #30
Conversation
a9be3fc
to
1541cf9
Compare
9498550
to
f8be633
Compare
f8be633
to
3fd93d9
Compare
src/sinks/gcp/cloud_storage.rs
Outdated
.compression | ||
.content_encoding() | ||
.map(|ce| HeaderValue::from_str(&to_string(ce)).unwrap()); | ||
let content_encoding = if config.set_content_encoding { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at how S3 allows configuring content_encoding? They have a content_encoding
option [1] which can override the content encoding derived from the compression method. I would guess this would probably have much higher changes of being accepted upstream
[1] https://github.com/vectordotdev/vector/blob/master/src/sinks/s3_common/config.rs#L118
[2] https://github.com/vectordotdev/vector/blob/master/src/sinks/s3_common/service.rs#L106
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that. However, we want to avoid setting content_encoding altogether, not overriding it with a different value than the one determined by the compression method (gzip).
I can try providing a different content_encoding value ("" or something invalid) and see if Lumberjack can still process these logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flaviofcruz I fixed the PR to provide a content_encoding option similar to the S3 sink.
Things worked on Vector Aggregator when content_encoding is set to an empty string.
eed149e
to
73f73e7
Compare
73f73e7
to
d88d5b8
Compare
Overview
This PR provides an option to override content encoding for objects uploaded to GCP Cloud Storage.
This breaks piping the uploaded objects to downstream systems such as Hadoop Filesystems, which don't work well with ".gzip" file with "content-encoding" header enabled.
Refer to this bug for further details
Test Plan
Added unit tests
Canaried the change (w/o content_encoding == "") to Vector Aggregator and made sure files can still be uploaded to GCS with content_encoding header populated.
Canaried the change (with content_encoding == "") to Vector Aggregator and verified that files no longer have content_encoding header populated.
Canaried the change (with set_content_encoding == "") to staging Vector Aggregator in GCP and verified that Shadow Service Request Logs are now available on Lumberjack: