Skip to content

Fix buffer resizing overflow and introduce safe power-of-two capacity growth #2985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

bernardladenthin
Copy link

  • Previously, calculating next capacity for large values like 1_073_741_824 caused overflow and returned Integer.MIN_VALUE.
  • Rewrote ensureCapacity to use Long for arithmetic to guard against exceeding Int.MAX_VALUE.
  • Introduced nextPowerOfTwoCapacity(minCapacity: Int) to safely calculate the next power-of-two ≥ minCapacity.
  • Capped capacity at Int.MAX_VALUE to prevent buffer allocation errors.
  • Added comprehensive unit tests covering negative, small, exact, and large input values to validate correct capacity growth behavior.

Note: You can compare the fixed behavior with the old logic using:

fun nextPowerOfTwoCapacity(minCapacity: Int): Int {
    return minCapacity.takeHighestOneBit() shl 1
}

… growth

- Previously, calculating next capacity for large values like 1_073_741_824 caused overflow and returned Integer.MIN_VALUE.
- Rewrote `ensureCapacity` to use `Long` for arithmetic to guard against exceeding Int.MAX_VALUE.
- Introduced `nextPowerOfTwoCapacity(minCapacity: Int)` to safely calculate the next power-of-two ≥ minCapacity.
- Capped capacity at Int.MAX_VALUE to prevent buffer allocation errors.
- Added comprehensive unit tests covering negative, small, exact, and large input values to validate correct capacity growth behavior.
@bernardladenthin
Copy link
Author

The method under test (nextPowerOfTwoCapacity) must be called from ByteArrayOutput, I guess. Anyway, I’d also like to test at least ensureCapacity. I didn’t find any existing tests for it, did I miss them? Also, is there a CI setup to execute all tests automatically?

@pdvrieze
Copy link
Contributor

As you are considering the case of very large buffers, you may want to consider that it is not actually possible to create an array of size 2^31 (MaxInt + 1), but only of size 2^31-1 (MaxInt). This maximum size is however not a power of 2. Meaning that you almost half the maximum supported size to 2^30 (or approx 1bln values instead of 2bln).

Ps. yes there is a CI setup (using Jetbrains private infrastructure)

@bernardladenthin
Copy link
Author

bernardladenthin commented Apr 15, 2025

Hi, and thanks for your comment! You're absolutely right — it's not a power of two. I tried to stay as close as possible to the existing implementation.

At the moment, I'm running into an issue when serializing very large data sets. Specifically, when working with buffers larger than 1_073_741_824, the current implementation runs into an integer overflow, which results in the following exception:

Exception in thread "main" java.lang.NegativeArraySizeException: -2147483648
	at kotlinx.serialization.cbor.internal.ByteArrayOutput.ensureCapacity(Streams.kt:49)
	at kotlinx.serialization.cbor.internal.ByteArrayOutput.write(Streams.kt:75)
	at kotlinx.serialization.cbor.internal.ByteArrayOutput.write$default(Streams.kt:64)
	at kotlinx.serialization.cbor.internal.EncoderKt.encodeByteArray(Encoder.kt:263)
	at kotlinx.serialization.cbor.internal.EncoderKt.encodeString(Encoder.kt:258)
	at kotlinx.serialization.cbor.internal.CborWriter.encodeElement(Encoder.kt:135)
	at kotlinx.serialization.encoding.AbstractEncoder.encodeFloatElement(AbstractEncoder.kt:62)

In the future, it might be worth considering an alternative buffer implementation — ideally one that scales automatically and supports sizes beyond the current limitation. I've developed such a buffer myself: streambuffer, which could be a useful foundation for handling very large data more robustly.

@pdvrieze
Copy link
Contributor

For this implementation it may be worthwhile to clip the range value to Int.MAX_VALUE.toLong(). Looking at the stream implementation it seems reasonably straightforward to extract the interface and have different implementations (so cases where a stream is available: Jvm, okio, kotlinx.io can support streams for reading/writing). Note though that this large use case is one where you would probably want to consider specialised approaches to reduce large memory usage (eg. streaming, memory mapped io, shared buffers, etc.)

@bernardladenthin
Copy link
Author

Right, my initial goal was to address the existing issue with the NegativeArraySizeException and the incorrect size calculation. This affects anyone working with more than 1_073_741_824 bytes, meaning the buffer is currently limited in an unfortunate way to roughly 1 GB. Additionally, the exception points to an error within the user's own implementation, which can be quite confusing.

In the short term, it might be helpful to at least fix the issue up to Int.MAX_VALUE. In my example, this then leads to a clear IllegalArgumentException("Required capacity exceeds maximum array size (Int.MAX_VALUE)."), which at least clearly indicates the actual problem. That seems like a better outcome than the current behavior.

Of course, a proper refactoring would still be ideal to fully support large data sets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants