The SDK doesn't use gzip compression #3105

andrejlevkovitch · 2024-09-10T14:08:57Z

Describe the bug

Despite the claims in documentation aws sdk doesn't perform gzip compression under any circumstances

Expected Behavior

So, it is expected that sdk will compress request data according to client configuration. However, I want to also clarify that this configuration lacks of list of allowed content types for compression - as we don't want to compress files like jpg or mp4

Current Behavior

Currently if you try to upload file with TransferManager or S3 client - no compression will happen. You also can try to compress the file manually and after that upload it with TransferManager or S3 client, but it will not work properly. First of all, there are no options in TransferManager to provide content-encoding, so it will not work obviously. Second of all, S3 client also will not work properly, because signer will rewrite content-encoding, unless you changed checksum algorithm to NOT_SET and use md5 instead.

Reproduction Steps

Just use default config for TransferManager and try to upload some big json file (bigger then 10 Kb!) and check GetBytesTransfered() for that operation - it will be equal (or a bit bigger) to size of the file. So no compression applied

Possible Solution

So, first of all we should enable the compression - the line returns false always.

But even if we change it explicitly to GZIP - it still will not work, because during the compression it still left previous content-length and doesn't set decoded content-length, that is required for backend service for uncompression. So, lets add it (and remove content-length - proper one will be set later by AddContentBodyToRequest).

But even after that it will not work, as for calculate checksum it still uses the old body, not compressed one. So, let change that too and use actual body (from httpRequest)

But it still will not work, because when we calculate the hash we expect that stream will be good, what can be not the case, especially after some reading and moving cursor back-and-forth. So, we should explicitly clear the stream, before start reading.

And only now, if you explicitly set checksum algorithm as NOT_SET and enable computing md5, compression will finally work

Additional Information/Context

No response

AWS CPP SDK version used

1.11.398

Compiler and Version used

gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

Operating System and version

Ubuntu 18.04.6 LTS

The text was updated successfully, but these errors were encountered:

DmitriyMusatkin · 2024-09-10T16:41:51Z

S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage. AFAIK Cloudwatch is the only service the SDK currently supports compression for.

With that said, overriding the content encoding when aws-chunked is used seems like a bug and it should be appending instead.

andrejlevkovitch · 2024-09-11T08:36:03Z

S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage

hm, are you sure? We actually upload gzip files in that way and we have unpacked files on s3

andrejlevkovitch · 2024-09-11T16:16:38Z

sorry, yeah, you are correct - the s3 doesn't decompress data. It just handles it in a nice way, so from user perspective difference is not visible, but yeah, data stored compressed

jmklix · 2024-10-04T19:37:43Z

@andrejlevkovitch did you have any other questions about compression and this sdk?

andrejlevkovitch · 2024-10-05T10:52:39Z

did you have any other questions about compression and this sdk?

nope, except the problems related to SDK I already described:

rewriting of Content-Encoding
gzip compressing doesn't work correctly if requested (Content-Length not changed after compression, incorrect checksum calculation after compression)
small bug in hash calculation (clearing of the stream is needed before using it)
compression params are never used, despite they are present in the api. Also that params are not flexible - they can be set for all files types only. IMHO, better to set it per Content-Type (so you do not need to compress video, but for json file it will be useful)

jmklix · 2024-11-12T18:46:21Z

Are you still expecting the sdk/s3 to compress your data? Your most recent questions seem to indicate that you think the sdk should be doing some type of compression but this is not done. None of the compression params will be used as @DmitriyMusatkin mentioned.

andrejlevkovitch · 2024-11-14T11:25:43Z

@jmklix

there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)
there are RequestCompressionConfig in ClientConfiguration that is enabled by default and never used - that is confusing at least

sbiscigl · 2024-11-22T16:11:38Z

there are still problem with rewriting of content-encoding, what seems like a bug (@DmitriyMusatkin also mentioned that)

you are absolutely right, put out a PR for the fix

sbiscigl · 2024-11-25T14:06:59Z

fix should be merged now, lemme know if that fixes it for you, it should.

as for the second part

For services that support it, request content will be compressed. On by default if dependency available

is the description of the the compression cmake arg, what can be updated to help clear up potential confusion? we call out "For services that support it" which is more or less what is called out in our public docs. What specifically in that documentation could be updated to avoid future confusion?

andrejlevkovitch · 2024-11-25T15:25:37Z

well, it is mostly about usage of the configuration: the config for s3 client contains field that is apriory not supported - that is already confusing, because I would expect that if the field is present, then it is possible to use it. Anyway I'm not sure what to suggest about that

github-actions · 2024-11-25T15:25:54Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

sbiscigl · 2024-11-25T17:14:37Z

because I would expect that if the field is present, then it is possible to use it.

Very valid and something that we likely could introduce in a future version. In the initial implementation likely should have been "if the service does not accept it, dont code generate it". just in its current state we cant remove it because we guarantee (or at least try our hardest) not to break compilation for anyone by removing stuff.

In a future breaking change though, making its presence based on ability is a good idea though.

andrejlevkovitch added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 10, 2024

jmklix added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Sep 10, 2024

jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Oct 4, 2024

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Oct 6, 2024

jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 12, 2024

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 15, 2024

sbiscigl mentioned this issue Nov 22, 2024

fixes content encoding on aws chunked requests #3205

Merged

11 tasks

andrejlevkovitch closed this as completed Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The SDK doesn't use gzip compression #3105

The SDK doesn't use gzip compression #3105

andrejlevkovitch commented Sep 10, 2024

DmitriyMusatkin commented Sep 10, 2024

andrejlevkovitch commented Sep 11, 2024

andrejlevkovitch commented Sep 11, 2024

jmklix commented Oct 4, 2024

andrejlevkovitch commented Oct 5, 2024

jmklix commented Nov 12, 2024

andrejlevkovitch commented Nov 14, 2024

sbiscigl commented Nov 22, 2024

sbiscigl commented Nov 25, 2024

andrejlevkovitch commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

sbiscigl commented Nov 25, 2024