Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2985

bernardladenthin · 2025-04-14T22:58:32Z

Previously, calculating next capacity for large values like 1_073_741_824 caused overflow and returned Integer.MIN_VALUE.
Rewrote ensureCapacity to use Long for arithmetic to guard against exceeding Int.MAX_VALUE.
Introduced nextPowerOfTwoCapacity(minCapacity: Int) to safely calculate the next power-of-two ≥ minCapacity.
Capped capacity at Int.MAX_VALUE to prevent buffer allocation errors.
Added comprehensive unit tests covering negative, small, exact, and large input values to validate correct capacity growth behavior.

Note: You can compare the fixed behavior with the old logic using:

fun nextPowerOfTwoCapacity(minCapacity: Int): Int {
    return minCapacity.takeHighestOneBit() shl 1
}

… growth - Previously, calculating next capacity for large values like 1_073_741_824 caused overflow and returned Integer.MIN_VALUE. - Rewrote `ensureCapacity` to use `Long` for arithmetic to guard against exceeding Int.MAX_VALUE. - Introduced `nextPowerOfTwoCapacity(minCapacity: Int)` to safely calculate the next power-of-two ≥ minCapacity. - Capped capacity at Int.MAX_VALUE to prevent buffer allocation errors. - Added comprehensive unit tests covering negative, small, exact, and large input values to validate correct capacity growth behavior.

bernardladenthin · 2025-04-15T07:11:00Z

The method under test (nextPowerOfTwoCapacity) must be called from ByteArrayOutput, I guess. Anyway, I’d also like to test at least ensureCapacity. I didn’t find any existing tests for it, did I miss them? Also, is there a CI setup to execute all tests automatically?

pdvrieze · 2025-04-15T08:56:22Z

As you are considering the case of very large buffers, you may want to consider that it is not actually possible to create an array of size 2^31 (MaxInt + 1), but only of size 2^31-1 (MaxInt). This maximum size is however not a power of 2. Meaning that you almost half the maximum supported size to 2^30 (or approx 1bln values instead of 2bln).

Ps. yes there is a CI setup (using Jetbrains private infrastructure)

bernardladenthin · 2025-04-15T09:02:19Z

Hi, and thanks for your comment! You're absolutely right — it's not a power of two. I tried to stay as close as possible to the existing implementation.

At the moment, I'm running into an issue when serializing very large data sets. Specifically, when working with buffers larger than 1_073_741_824, the current implementation runs into an integer overflow, which results in the following exception:

Exception in thread "main" java.lang.NegativeArraySizeException: -2147483648
	at kotlinx.serialization.cbor.internal.ByteArrayOutput.ensureCapacity(Streams.kt:49)
	at kotlinx.serialization.cbor.internal.ByteArrayOutput.write(Streams.kt:75)
	at kotlinx.serialization.cbor.internal.ByteArrayOutput.write$default(Streams.kt:64)
	at kotlinx.serialization.cbor.internal.EncoderKt.encodeByteArray(Encoder.kt:263)
	at kotlinx.serialization.cbor.internal.EncoderKt.encodeString(Encoder.kt:258)
	at kotlinx.serialization.cbor.internal.CborWriter.encodeElement(Encoder.kt:135)
	at kotlinx.serialization.encoding.AbstractEncoder.encodeFloatElement(AbstractEncoder.kt:62)

In the future, it might be worth considering an alternative buffer implementation — ideally one that scales automatically and supports sizes beyond the current limitation. I've developed such a buffer myself: streambuffer, which could be a useful foundation for handling very large data more robustly.

pdvrieze · 2025-04-15T13:16:17Z

For this implementation it may be worthwhile to clip the range value to Int.MAX_VALUE.toLong(). Looking at the stream implementation it seems reasonably straightforward to extract the interface and have different implementations (so cases where a stream is available: Jvm, okio, kotlinx.io can support streams for reading/writing). Note though that this large use case is one where you would probably want to consider specialised approaches to reduce large memory usage (eg. streaming, memory mapped io, shared buffers, etc.)

bernardladenthin · 2025-04-15T13:39:24Z

Right, my initial goal was to address the existing issue with the NegativeArraySizeException and the incorrect size calculation. This affects anyone working with more than 1_073_741_824 bytes, meaning the buffer is currently limited in an unfortunate way to roughly 1 GB. Additionally, the exception points to an error within the user's own implementation, which can be quite confusing.

In the short term, it might be helpful to at least fix the issue up to Int.MAX_VALUE. In my example, this then leads to a clear IllegalArgumentException("Required capacity exceeds maximum array size (Int.MAX_VALUE)."), which at least clearly indicates the actual problem. That seems like a better outcome than the current behavior.

Of course, a proper refactoring would still be ideal to fully support large data sets.

…yOutput

bernardladenthin mentioned this pull request Apr 14, 2025

Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2984

Closed

Refactor: move nextPowerOfTwoCapacity to companion object in ByteArra…

8437a47

…yOutput

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2985

Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2985

bernardladenthin commented Apr 14, 2025

bernardladenthin commented Apr 15, 2025

pdvrieze commented Apr 15, 2025

bernardladenthin commented Apr 15, 2025 •

edited

Loading

pdvrieze commented Apr 15, 2025

bernardladenthin commented Apr 15, 2025

Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2985

Are you sure you want to change the base?

Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2985

Conversation

bernardladenthin commented Apr 14, 2025

bernardladenthin commented Apr 15, 2025

pdvrieze commented Apr 15, 2025

bernardladenthin commented Apr 15, 2025 • edited Loading

pdvrieze commented Apr 15, 2025

bernardladenthin commented Apr 15, 2025

bernardladenthin commented Apr 15, 2025 •

edited

Loading