Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LoopVectorization.jl's threads, sometimes? #113

Open
mcabbott opened this issue Jun 30, 2021 · 2 comments
Open

Use LoopVectorization.jl's threads, sometimes? #113

mcabbott opened this issue Jun 30, 2021 · 2 comments

Comments

@mcabbott
Copy link
Owner

mcabbott commented Jun 30, 2021

LoopVectorization has changed two things since its interaction with Tullio was thought out:

  1. a name change @avx -> @turbo, and
  2. a multi-threading macro @avxt or @tturbo == @turbo thread=true.

The easy change would be to make the keyword here turbo=true etc.

I believe the threading uses https://github.com/JuliaSIMD/Polyester.jl, and has lower overhead to launch threads than Threads.@spawn. But if I understand right, using both together can cause problems, e.g. JuliaSIMD/LoopVectorization.jl#221 or JuliaSIMD/ThreadingUtilities.jl#25. To allow but not require use of this, the questions are:

  • Should this just mean calling @tturbo on the whole iteration space (as is done for KernelAbstractions now) or should it also/only be possible to use these threads within Tullio's recursive threads-then-blocks algorithm?
  • Is there a non-confusing interface for this? Since @tullio aims to be concise it's nice not to need 5 keyword options every time.
@chriselrod
Copy link
Contributor

chriselrod commented Jun 30, 2021

Tullio's recursive threads-then-blocks algorithm?

An additional consideration is that I haven't implemented anything like this in LoopVectorization yet, so Tullio's current implementation will get better performance beyond a certain size:
img
(Also, I made LV ramp thread use up more slowly since creating this plot, so I should probably rerun this benchmark to see how it looks now.)
I'll implement this eventually, but it'll be a while.

@mcabbott
Copy link
Owner Author

mcabbott commented Jun 30, 2021

That's a nice graph. You can see that Tullio turns on threading too early (around 64 IIRC) on your machine -- the overhead of @spawn isn't paying for itself.

OK, so it sounds like the goal is to figure out how to use ThreadingUtilities or Polyester in place of @spawn.

One possible interface is like @tullio A[i] := exp(B[i]) threads=Polyester. There's already grad=Base / Dual / false. And if it's an orthogonal choice to whether to use @turbo then perhaps it shouldn't share a keyword.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants