Retain inner-product matmul beyond loop (w/o storing to global memory) #3612
michael-swan
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a loop like so:
I would like to retain those
Z
values after this loop, perhaps by concatenating it onto a growing tensor (in shared memory or registers) or by writing to atl.tensor
of the right shape.Z = tl.zero(..) ... Z[i,:,:] = tl.dot(X, Y)
is not an option, as I've tried various incarnations of this.Z = tl.zero(..) ... Z = tl.cat(Z, tl.dot(X, Y))
is also not an option and fails for different reasons. The only thing that "works" is to doZ = tl.zero(<full-size>) ... Z += tl.dot(X, Y)
whereX
andY
are selected in an outer-product order but there are context-specific reasons I do not want to do this.Is there a standard way to carry this information forward without requiring me to perform a matmul accumulate or storing to global memory?
Beta Was this translation helpful? Give feedback.
All reactions