Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Allow models with more than 128 wide layers to run without error, by not using a fixed number of
CUDAStreamCompactionConfig
.Closes #727
Todo
CUDAScatter
relies onMAX_STREAMS
, so refactoring is required.FLAMEGPUDeviceException
relies onMAX_STREAMS
, so refactoring is required.CUDAScanCompaction::MAX_STREAMS
, replacing with a member variable of the current number allocated.CUDAScanCompaction::MAX_STREAMS
is/was checked)Notes
CUDAScanCompaction::MAX_STREAMS
is hardcoded to 128, the upper limit that can run on a (<= SM75) device at once. This is a bad assumption.Models can have more than 128 functions per layer, which requires that many streams
CUDAScatter is initialsed as a singleton member of CUDASimulation, so we know the fixed model properties at that point in time, so can add a call to allocate enough data then.
DeviceExceptionManager
has an array of 1 device pointer to a DeviceExceptionBuffer per stream, and host memory to copy that back to.DeviceExceptionManager
is a member ofcudaSimulation::singletons
, so can be allocated during singleton initialisaton.CUDAScanCompaction
is a member variable of CUDAScatter, which is default initialised (rather than being manually constructed or mentioned by an inisialiser list.This will need to be changed to pass the number of streams to create during conscruction, or to allocate the required number of elements later.
This appeasr to to be the only instatntiations of
CUDAScanCompaction
afiak.Destrcution / deleteion will also be required.