-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utility of vmmv.m and vmv.v.v intrinsics #74
Comments
|
I'm not sure the |
Wouldn't the vmv.v.v intrinsic need an extra operand for the undisturbed values to be useful for tail undisturbed? Right now the register allocator will just use whatever register is convenient as the destination so you have no control. |
I would suggest So I agree with @topperc, we should encourage user to use |
What happens when using operator= on a type with fractional LMUL? |
I think it becomes a vmv1r.v if a copy is needed. |
I forgot about VL when I started this issue, but there's been some good discussion here. Using vmv.v.v can improve performance by limiting the number of elements copied. But forcing users to explicitly code all copies using intrinsics to get the best performance seems bad. Variables in C should be a different abstraction than hardware registers. To handle the tail undisturbed case we need a new intrinsic that takes that undisturbed values as a second input. This would not apply to vmmv.m since I mask instructions are always tail agnostic if I'm reading the spec correctly. |
We currently have We should explicitly emphasize this in the document of v1.0 since the assumption is necessary for redundant move eliminations. |
Shouldn't users be encouraged to use operator= to copy vectors like any other variable? Using a mv intrinsic guarantees an instruction is emitted by the compiler even if register allocation would allow the input and output to use the same register. Unless we teach the compiler to look for these operations and turn them into copies that register allocation can coalesce.
The text was updated successfully, but these errors were encountered: