Retain inner-product matmul beyond loop (w/o storing to global memory) #3612

michael-swan · 2024-04-09T02:11:43Z

michael-swan
Apr 9, 2024

I have a loop like so:

for i in range(0, ...):
    # Load
    X = tl.load(..)
    Y = tl.load(..)
    # Compute
    Z = tl.dot(X, Y)
    # Update ptrs ...

I would like to retain those Z values after this loop, perhaps by concatenating it onto a growing tensor (in shared memory or registers) or by writing to a tl.tensor of the right shape. Z = tl.zero(..) ... Z[i,:,:] = tl.dot(X, Y) is not an option, as I've tried various incarnations of this. Z = tl.zero(..) ... Z = tl.cat(Z, tl.dot(X, Y)) is also not an option and fails for different reasons. The only thing that "works" is to do Z = tl.zero(<full-size>) ... Z += tl.dot(X, Y) where X and Y are selected in an outer-product order but there are context-specific reasons I do not want to do this.

Is there a standard way to carry this information forward without requiring me to perform a matmul accumulate or storing to global memory?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retain inner-product matmul beyond loop (w/o storing to global memory) #3612

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Retain inner-product matmul beyond loop (w/o storing to global memory) #3612

michael-swan Apr 9, 2024

Replies: 0 comments

michael-swan
Apr 9, 2024