Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

fix(gcp sink): Provide an option to override content encoding for GCP Cloud Storage Sink #30

Merged
merged 3 commits into from
Sep 12, 2024

Conversation

hieu-db
Copy link
Collaborator

@hieu-db hieu-db commented Sep 5, 2024

Overview

This PR provides an option to override content encoding for objects uploaded to GCP Cloud Storage.
This breaks piping the uploaded objects to downstream systems such as Hadoop Filesystems, which don't work well with ".gzip" file with "content-encoding" header enabled.

Refer to this bug for further details

Test Plan

Added unit tests

$ rustfmt src/sinks/gcp/cloud_storage.rs
$ cargo test --color=always  --package vector --lib sinks::gcp::cloud_storage
$ cargo build

Canaried the change (w/o content_encoding == "") to Vector Aggregator and made sure files can still be uploaded to GCS with content_encoding header populated.

image

Canaried the change (with content_encoding == "") to Vector Aggregator and verified that files no longer have content_encoding header populated.

image

Canaried the change (with set_content_encoding == "") to staging Vector Aggregator in GCP and verified that Shadow Service Request Logs are now available on Lumberjack:
image

@hieu-db hieu-db force-pushed the hieu-db/disable-content-encoding branch from a9be3fc to 1541cf9 Compare September 5, 2024 16:17
@hieu-db hieu-db changed the title [Vector] Disable content encoding for GCP sink fix(gcp sink): Provide an option to disable setting content encoding for GCP Cloud Storage Sink Sep 5, 2024
@hieu-db hieu-db force-pushed the hieu-db/disable-content-encoding branch from 9498550 to f8be633 Compare September 5, 2024 20:15
@hieu-db hieu-db force-pushed the hieu-db/disable-content-encoding branch from f8be633 to 3fd93d9 Compare September 5, 2024 21:25
.compression
.content_encoding()
.map(|ce| HeaderValue::from_str(&to_string(ce)).unwrap());
let content_encoding = if config.set_content_encoding {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at how S3 allows configuring content_encoding? They have a content_encoding option [1] which can override the content encoding derived from the compression method. I would guess this would probably have much higher changes of being accepted upstream

[1] https://github.com/vectordotdev/vector/blob/master/src/sinks/s3_common/config.rs#L118
[2] https://github.com/vectordotdev/vector/blob/master/src/sinks/s3_common/service.rs#L106

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that. However, we want to avoid setting content_encoding altogether, not overriding it with a different value than the one determined by the compression method (gzip).

I can try providing a different content_encoding value ("" or something invalid) and see if Lumberjack can still process these logs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flaviofcruz I fixed the PR to provide a content_encoding option similar to the S3 sink.
Things worked on Vector Aggregator when content_encoding is set to an empty string.

@hieu-db hieu-db force-pushed the hieu-db/disable-content-encoding branch from eed149e to 73f73e7 Compare September 10, 2024 06:32
@hieu-db hieu-db changed the title fix(gcp sink): Provide an option to disable setting content encoding for GCP Cloud Storage Sink fix(gcp sink): Provide an option to override content encoding for GCP Cloud Storage Sink Sep 10, 2024
@hieu-db hieu-db force-pushed the hieu-db/disable-content-encoding branch from 73f73e7 to d88d5b8 Compare September 10, 2024 07:03
@hieu-db hieu-db requested a review from flaviofcruz September 10, 2024 13:57
@hieu-db hieu-db merged commit 2a45ac8 into databricks:v0.39 Sep 12, 2024
42 of 49 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants