alts: increase write record size max to 1MB #8512

kevinGC · 2025-08-13T18:28:31Z

Increases large write speed by 9.62% per BenchmarkLargeMessage. Detailed
benchmarking numbers below.

Rather than use different sizes for the maximum read record, write
record, and write buffer, just use 1MB for all of them.

Using larger records reduces the amount of payload splitting and the
number of syscalls made by ALTS.

Part of #8510. SO_RCVLOWAT and TCP receive zerocopy are only effective
with larger payloads, and so ALTS can't be limiting payload sizes to 4
KiB. SO_RCVLOWAT and zerocopy are on the receive side, but for
benchmarking purposes we need ALTS to send large messages.

Note that this PR includes #8511. GitHub doesn't support proper commit chains / stacked PRs, so I'm doing this in several PRs with some (annoyingly) redundant commits. Let me know if this isn't a good workflow for you and I'll change things up.

Benchmarks:

$ benchstat large_msg_old.txt large_msg.txt
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/credentials/alts/internal/conn
cpu: AMD Ryzen Threadripper PRO 3945WX 12-Cores
                │ large_msg_old.txt │           large_msg.txt           │
                │      sec/op       │   sec/op     vs base              │
LargeMessage-12         68.88m ± 1%   62.25m ± 0%  -9.62% (p=0.002 n=6)

It's generally useful to have new-and-improved Go. One specific useful feature is `b.Loop()`, which makes benchmarking easier.

It's only called in one place, and is effectively a method on conn. Part of grpc#8510.

Increases large write speed by 9.62% per BenchmarkLargeMessage. Detailed benchmarking numbers below. Rather than use different sizes for the maximum read record, write record, and write buffer, just use 1MB for all of them. Using larger records reduces the amount of payload splitting and the number of syscalls made by ALTS. Part of grpc#8510. SO_RCVLOWAT and TCP receive zerocopy are only effective with larger payloads, and so ALTS can't be limiting payload sizes to 4 KiB. SO_RCVLOWAT and zerocopy are on the receive side, but for benchmarking purposes we need ALTS to send large messages. Benchmarks: $ benchstat large_msg_old.txt large_msg.txt goos: linux goarch: amd64 pkg: google.golang.org/grpc/credentials/alts/internal/conn cpu: AMD Ryzen Threadripper PRO 3945WX 12-Cores │ large_msg_old.txt │ large_msg.txt │ │ sec/op │ sec/op vs base │ LargeMessage-12 68.88m ± 1% 62.25m ± 0% -9.62% (p=0.002 n=6)

codecov · 2025-08-13T18:32:24Z

Codecov Report

❌ Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.78%. Comparing base (55e8b90) to head (6974534).
⚠️ Report is 15 commits behind head on master.

Files with missing lines	Patch %	Lines
credentials/alts/internal/conn/record.go	96.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8512      +/-   ##
==========================================
- Coverage   82.40%   81.78%   -0.62%     
==========================================
  Files         414      413       -1     
  Lines       40531    40519      -12     
==========================================
- Hits        33399    33138     -261     
- Misses       5770     6001     +231     
- Partials     1362     1380      +18

Files with missing lines	Coverage Δ
credentials/alts/internal/conn/common.go	`100.00% <ø> (ø)`
credentials/alts/internal/conn/record.go	`79.21% <96.66%> (+1.00%)`	⬆️

... and 45 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arjan-bal · 2025-08-14T08:13:48Z

Hi @kevinGC, I had a discussion with @dfawley about changing the buffer sizes earlier this year. Presently, the buffers used by ALTS can only grow, but not shrink. If there are hundreds of directpath channels (see internal doc go/gcs-dp-connections-number-issue) created in an idle client, these buffers can hold on to significant amount of memory if they are 1MB each.

I tried the scatter gather style of Java, but we need to materialize the read data for decryption and that causes a copy operation. The performance was worse than the present implementation. A simpler and possibly equally performant solution could be to check if the len(currentBuffer) > 32KB (initial size of the buffer) and the nextFrameLen < 0.5 len(currentBuffer). If both are true copy the data into a new buffer from the pool of half the size. I decided not to pursue this further since the GCS benchmarks were showing that ALTS records were always around 16KB and there were diminishing returns on having very large buffer sizes.

ctiller · 2025-08-14T18:50:07Z

I'd be interested in finding where in the stack the 16KB size is coming from.

kevinGC · 2025-08-14T20:47:55Z

@ctiller I did a quick test and confirmed no records larger than 16 KiB for a 1 GiB file transfer using GCS Fuse. I'm not actually finding the 16KB limit in grpc-go though. AFAICT:

grpc-go will accept records up to 1 MiB
ALTS doesn't have any mechanism to advertise the max acceptable record size. parseFramedMsg just errors out if the record is too large
ALTS has a max record size of a few GiB

It feels like I'm missing something obvious -- ALTS must have some way of knowing how large records can be. Otherwise implementations would be limited to sending small records for fear of overwhelming the receiver.

I'll keep looking for now. @arjan-bal let me see what I can do about the 16KiB limit and then we can revisit shrinking/pooling/scatter-gather/etc.

matthewstevenson88 · 2025-08-15T23:35:46Z

@kevinGC Is GCS Fuse seeing the 16kb frames on the request from the GCS Fuse client to GCS, in the response from GCS, or both?

From playing around with one of our internal tests, I have a theory about why the response might be limited to 16kb, will do some more testing on Monday to confirm. It's not clear to me where the 16kb frame on the request path would come from though.

matthewstevenson88 · 2025-08-16T02:47:24Z

Update: The max frame size is not being set here, which means the ALTS max frame size negotiation is not happening during the handshake and (I think) means that the server is defaulting to 16KB frame size to be on the safe side. I think we should try setting it to a larger value (C++ has it set at 1024*1024), which should trigger the server to start sending large frames, and the other changes in this PR should (hopefully) lead the client to send large frames.

Could we run a quick test and see if setting the max frame size does the trick when talking to a real GCS server?

kevinGC · 2025-08-18T21:27:00Z

I was going to run a quick test... but the same code I used last week to test this is suddenly not using ALTS for some reason. Another thing to debug along the way :/

kevinGC added 3 commits August 13, 2025 10:19

deps: bump go version to 1.24

0fb892f

It's generally useful to have new-and-improved Go. One specific useful feature is `b.Loop()`, which makes benchmarking easier.

alts: move ParseFramedMsg out of common

dc8a0ca

It's only called in one place, and is effectively a method on conn. Part of grpc#8510.

kevinGC mentioned this pull request Aug 13, 2025

alts: receive low watermark support #8513

Open

arjan-bal self-requested a review August 14, 2025 07:15

arjan-bal assigned kevinGC Aug 14, 2025

arjan-bal requested a review from gtcooke94 August 14, 2025 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

alts: increase write record size max to 1MB #8512

alts: increase write record size max to 1MB #8512

Uh oh!

kevinGC commented Aug 13, 2025

Uh oh!

codecov bot commented Aug 13, 2025

Uh oh!

arjan-bal commented Aug 14, 2025

Uh oh!

ctiller commented Aug 14, 2025

Uh oh!

kevinGC commented Aug 14, 2025 •

edited

Loading

Uh oh!

matthewstevenson88 commented Aug 15, 2025

Uh oh!

matthewstevenson88 commented Aug 16, 2025

Uh oh!

kevinGC commented Aug 18, 2025

Uh oh!

Uh oh!

alts: increase write record size max to 1MB #8512

Are you sure you want to change the base?

alts: increase write record size max to 1MB #8512

Uh oh!

Conversation

kevinGC commented Aug 13, 2025

Uh oh!

codecov bot commented Aug 13, 2025

Codecov Report

Uh oh!

arjan-bal commented Aug 14, 2025

Uh oh!

ctiller commented Aug 14, 2025

Uh oh!

kevinGC commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewstevenson88 commented Aug 15, 2025

Uh oh!

matthewstevenson88 commented Aug 16, 2025

Uh oh!

kevinGC commented Aug 18, 2025

Uh oh!

Uh oh!

kevinGC commented Aug 14, 2025 •

edited

Loading