Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use noDuplicate# where appropriate #294

Open
treeowl opened this issue Sep 22, 2020 · 2 comments
Open

Use noDuplicate# where appropriate #294

treeowl opened this issue Sep 22, 2020 · 2 comments

Comments

@treeowl
Copy link
Collaborator

treeowl commented Sep 22, 2020

runArray and similar should probably use noDuplicate# to avoid duplicated work and loss of sharing. <> should probably use noDuplicate# when the result is large, for some value of large.

@chessai
Copy link
Member

chessai commented Jul 16, 2021

What are the semantics of noDuplicate#? I've never used it before.

@treeowl
Copy link
Collaborator Author

treeowl commented Jul 16, 2021

@chessai , it's about lazy blackholing. Usually, it's okay to occasionally evaluate the same thunk twice in different threads, as long as it doesn't take too long. Sometimes, we might want to avoid that. There's always a stack of thunks currently under evaluation. When GHC hits noDuplicate#, it walks that stack and blackholes all the thunks, ensuring only one thread is working on them. noDuplicate# is precisely the difference between unsafePerformIO and unsafeDupablePerformIO. noDuplicate# is not free, so when might it be worthwhile? That can be a bit hard to guess in general. Suppose you evaluate fmap f arr, for non-huge arr, in two threads at nearly the same time, and both of them actually perform the evaluation (i.e., neither had to GC in the middle). Often, this is fine. But if f is expensive, then that's bad—each thread will produce an f e thunk for each e, which (I believe) will never be de-duplicated.

Here's my rough guess:

  1. Operations that are quite expensive compared to noDuplicate# should likely use it.
  2. Operations that can produce a substantial loss of sharing in case of duplication should perhaps have versions that call noDuplicate#.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants