-
Notifications
You must be signed in to change notification settings - Fork 4.6k
alts: increase write record size max to 1MB #8512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
It's generally useful to have new-and-improved Go. One specific useful feature is `b.Loop()`, which makes benchmarking easier.
It's only called in one place, and is effectively a method on conn. Part of grpc#8510.
Increases large write speed by 9.62% per BenchmarkLargeMessage. Detailed benchmarking numbers below. Rather than use different sizes for the maximum read record, write record, and write buffer, just use 1MB for all of them. Using larger records reduces the amount of payload splitting and the number of syscalls made by ALTS. Part of grpc#8510. SO_RCVLOWAT and TCP receive zerocopy are only effective with larger payloads, and so ALTS can't be limiting payload sizes to 4 KiB. SO_RCVLOWAT and zerocopy are on the receive side, but for benchmarking purposes we need ALTS to send large messages. Benchmarks: $ benchstat large_msg_old.txt large_msg.txt goos: linux goarch: amd64 pkg: google.golang.org/grpc/credentials/alts/internal/conn cpu: AMD Ryzen Threadripper PRO 3945WX 12-Cores │ large_msg_old.txt │ large_msg.txt │ │ sec/op │ sec/op vs base │ LargeMessage-12 68.88m ± 1% 62.25m ± 0% -9.62% (p=0.002 n=6)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8512 +/- ##
==========================================
- Coverage 82.40% 81.78% -0.62%
==========================================
Files 414 413 -1
Lines 40531 40519 -12
==========================================
- Hits 33399 33138 -261
- Misses 5770 6001 +231
- Partials 1362 1380 +18
🚀 New features to boost your workflow:
|
Hi @kevinGC, I had a discussion with @dfawley about changing the buffer sizes earlier this year. Presently, the buffers used by ALTS can only grow, but not shrink. If there are hundreds of directpath channels (see internal doc go/gcs-dp-connections-number-issue) created in an idle client, these buffers can hold on to significant amount of memory if they are 1MB each. I tried the scatter gather style of Java, but we need to materialize the read data for decryption and that causes a copy operation. The performance was worse than the present implementation. A simpler and possibly equally performant solution could be to check if the |
I'd be interested in finding where in the stack the 16KB size is coming from. |
@ctiller I did a quick test and confirmed no records larger than 16 KiB for a 1 GiB file transfer using GCS Fuse. I'm not actually finding the 16KB limit in grpc-go though. AFAICT:
It feels like I'm missing something obvious -- ALTS must have some way of knowing how large records can be. Otherwise implementations would be limited to sending small records for fear of overwhelming the receiver. I'll keep looking for now. @arjan-bal let me see what I can do about the 16KiB limit and then we can revisit shrinking/pooling/scatter-gather/etc. |
@kevinGC Is GCS Fuse seeing the 16kb frames on the request from the GCS Fuse client to GCS, in the response from GCS, or both? From playing around with one of our internal tests, I have a theory about why the response might be limited to 16kb, will do some more testing on Monday to confirm. It's not clear to me where the 16kb frame on the request path would come from though. |
Update: The max frame size is not being set here, which means the ALTS max frame size negotiation is not happening during the handshake and (I think) means that the server is defaulting to 16KB frame size to be on the safe side. I think we should try setting it to a larger value (C++ has it set at 1024*1024), which should trigger the server to start sending large frames, and the other changes in this PR should (hopefully) lead the client to send large frames. Could we run a quick test and see if setting the max frame size does the trick when talking to a real GCS server? |
I was going to run a quick test... but the same code I used last week to test this is suddenly not using ALTS for some reason. Another thing to debug along the way :/ |
Increases large write speed by 9.62% per BenchmarkLargeMessage. Detailed
benchmarking numbers below.
Rather than use different sizes for the maximum read record, write
record, and write buffer, just use 1MB for all of them.
Using larger records reduces the amount of payload splitting and the
number of syscalls made by ALTS.
Part of #8510. SO_RCVLOWAT and TCP receive zerocopy are only effective
with larger payloads, and so ALTS can't be limiting payload sizes to 4
KiB. SO_RCVLOWAT and zerocopy are on the receive side, but for
benchmarking purposes we need ALTS to send large messages.
Note that this PR includes #8511. GitHub doesn't support proper commit chains / stacked PRs, so I'm doing this in several PRs with some (annoyingly) redundant commits. Let me know if this isn't a good workflow for you and I'll change things up.
Benchmarks: