-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The SDK doesn't use gzip compression #3105
Comments
S3 does not support payload compression service side, so it would not be possible to do it through sdk. i.e. if you upload as GZIP, s3 will keep it as GZIP in storage. AFAIK Cloudwatch is the only service the SDK currently supports compression for. With that said, overriding the content encoding when aws-chunked is used seems like a bug and it should be appending instead. |
hm, are you sure? We actually upload gzip files in that way and we have unpacked files on s3 |
sorry, yeah, you are correct - the s3 doesn't decompress data. It just handles it in a nice way, so from user perspective difference is not visible, but yeah, data stored compressed |
@andrejlevkovitch did you have any other questions about compression and this sdk? |
nope, except the problems related to SDK I already described:
|
Are you still expecting the sdk/s3 to compress your data? Your most recent questions seem to indicate that you think the sdk should be doing some type of compression but this is not done. None of the compression params will be used as @DmitriyMusatkin mentioned. |
|
you are absolutely right, put out a PR for the fix |
fix should be merged now, lemme know if that fixes it for you, it should. as for the second part
is the description of the the compression cmake arg, what can be updated to help clear up potential confusion? we call out "For services that support it" which is more or less what is called out in our public docs. What specifically in that documentation could be updated to avoid future confusion? |
well, it is mostly about usage of the configuration: the config for s3 client contains field that is apriory not supported - that is already confusing, because I would expect that if the field is present, then it is possible to use it. Anyway I'm not sure what to suggest about that |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Very valid and something that we likely could introduce in a future version. In the initial implementation likely should have been "if the service does not accept it, dont code generate it". just in its current state we cant remove it because we guarantee (or at least try our hardest) not to break compilation for anyone by removing stuff. In a future breaking change though, making its presence based on ability is a good idea though. |
Describe the bug
Despite the claims in documentation aws sdk doesn't perform gzip compression under any circumstances
Expected Behavior
So, it is expected that sdk will compress request data according to client configuration. However, I want to also clarify that this configuration lacks of list of allowed content types for compression - as we don't want to compress files like
jpg
ormp4
Current Behavior
Currently if you try to upload file with TransferManager or S3 client - no compression will happen. You also can try to compress the file manually and after that upload it with TransferManager or S3 client, but it will not work properly. First of all, there are no options in TransferManager to provide content-encoding, so it will not work obviously. Second of all, S3 client also will not work properly, because signer will rewrite content-encoding, unless you changed checksum algorithm to
NOT_SET
and use md5 instead.Reproduction Steps
Just use default config for
TransferManager
and try to upload some bigjson
file (bigger then 10 Kb!) and checkGetBytesTransfered()
for that operation - it will be equal (or a bit bigger) to size of the file. So no compression appliedPossible Solution
So, first of all we should enable the compression - the line returns
false
always.But even if we change it explicitly to
GZIP
- it still will not work, because during the compression it still left previous content-length and doesn't set decoded content-length, that is required for backend service for uncompression. So, lets add it (and remove content-length - proper one will be set later byAddContentBodyToRequest
).But even after that it will not work, as for calculate checksum it still uses the old body, not compressed one. So, let change that too and use actual body (from
httpRequest
)But it still will not work, because when we calculate the hash we expect that stream will be good, what can be not the case, especially after some reading and moving cursor back-and-forth. So, we should explicitly clear the stream, before start reading.
And only now, if you explicitly set checksum algorithm as NOT_SET and enable computing md5, compression will finally work
Additional Information/Context
No response
AWS CPP SDK version used
1.11.398
Compiler and Version used
gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Operating System and version
Ubuntu 18.04.6 LTS
The text was updated successfully, but these errors were encountered: