Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryManagement<Storage>::reserve() can panic with index out of bounds #243

Open
Quba1 opened this issue Nov 10, 2024 · 2 comments · May be fixed by #335
Open

MemoryManagement<Storage>::reserve() can panic with index out of bounds #243

Quba1 opened this issue Nov 10, 2024 · 2 comments · May be fixed by #335
Labels
bug Something isn't working

Comments

@Quba1
Copy link

Quba1 commented Nov 10, 2024

cubecl_runtime::memory_management::memory_manage::MemoryManagement<Storage>::reserve() (below) can panic when size is bigger than any available pool.

/// Finds a spot in memory for a resource with the given size in bytes, and returns a handle to it
pub fn reserve(&mut self, size: u64, exclude: Option<&MemoryLock>) -> SliceHandle {
// If this happens every nanosecond, counts overflows after 585 years, so not worth thinking too
// hard about overflow here.
self.alloc_reserve_count += 1;
// Find first pool where size <= p.max_alloc with a binary search.
let pool_ind = self.pools.partition_point(|p| size > p.max_alloc_size());
let pool = &mut self.pools[pool_ind];
if pool.max_alloc_size() < size {
panic!("No memory pool big enough to reserve {size} bytes.");
}
pool.reserve(&mut self.storage, size, exclude)
}

In my case (running burn) my program attempted to allocate 2_925_527_040 bytes with pools max sizes: [0] 4_028_128 B [1] 4_028_128 B [2] 32_225_024 [3] [4] 2_062_401_536 B. So pool_ind was computed as 5 and &mut self.pools[pool_ind] panicked.

The function can panic in edge-cases because per docs partition_point() requires that:

(...) all elements for which the predicate returns true are at the start of the slice and all elements for which the predicate returns false are at the end
(...)
If this slice is not partitioned, the returned result is unspecified and meaningless, as this method performs a kind of binary search.

and the condition seems to not be checked at any point.

The same issue also occurs a few lines lower in alloc().

The solution is probably a simple bounds check or capping pool_ind to pools.len() - 1 or some more complex error handling.

Happy to help with solving the problem.

@nathanielsimard nathanielsimard added the bug Something isn't working label Nov 18, 2024
@nathanielsimard
Copy link
Member

Sure would love the help :)

@ragyabraham
Copy link

Yup, seeing this in actions when running training

=== PANIC ===
A fatal error happened, you can check the experiment logs here => '/tmp/guide/experiment.log'
=============
thread 'main' panicked at /Users/ragy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cubecl-runtime-0.3.0/src/memory_management/memory_manage.rs:283:35:
index out of bounds: the len is 6 but the index is 6

Quba1 added a commit to Quba1/cubecl that referenced this issue Dec 2, 2024
Quba1 added a commit to Quba1/cubecl that referenced this issue Dec 2, 2024
Quba1 added a commit to Quba1/cubecl that referenced this issue Dec 2, 2024
@Quba1 Quba1 linked a pull request Dec 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants