You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that gemm_tr and gemm_ad currently do not leverage matrixmultiply for larger matrices. I noticed when I was profiling after making some changes to some of my performance-sensitive code. In fact, pre-computing let a_t = a.transpose() and calling gemm(1.0, &a_t, &b, 1.0) was significantly faster.
The text was updated successfully, but these errors were encountered:
That's right, they don't use matrixmultiply currently. I suppose we could make gemm_tr work with matrixmultiply by adjusting the row and col strides accordingly. The gemm_ad method on the other hand can't use matrixmultiply (except for f32 and f64 matrices in which case this is equivalent to gemm_tr) because it does not support complex numbers.
Ah, I see. I had totally forgotten that matrixmultiply does not support complex numbers, and I moreover did not know that it doesn't native support transposition. Thanks for explaining!
I assume nothing has happened on this issue? I'd be willing to try modifying gemm_tr to use matrixmultiply if nobody else wants to work on it. The performance difference is substantial, and also quite surprising to someone who doesn't know about it. Perhaps a warning in the documentation for the tr_mul and gemm_tr methods would be appropriate until this is fixed?
It seems that
gemm_tr
andgemm_ad
currently do not leveragematrixmultiply
for larger matrices. I noticed when I was profiling after making some changes to some of my performance-sensitive code. In fact, pre-computinglet a_t = a.transpose()
and callinggemm(1.0, &a_t, &b, 1.0)
was significantly faster.The text was updated successfully, but these errors were encountered: