@chessai , it's about lazy blackholing. Usually, it's okay to occasionally evaluate the same thunk twice in different threads, as long as it doesn't take too long. Sometimes, we might want to avoid that. There's always a stack of thunks currently under evaluation. When GHC hits noDuplicate#, it walks that stack and blackholes all the thunks, ensuring only one thread is working on them. noDuplicate# is precisely the difference between unsafePerformIO and unsafeDupablePerformIO. noDuplicate# is not free, so when might it be worthwhile? That can be a bit hard to guess in general. Suppose you evaluate fmap f arr, for non-huge arr, in two threads at nearly the same time, and both of them actually perform the evaluation (i.e., neither had to GC in the middle). Often, this is fine. But if f is expensive, then that's bad—each thread will produce an f e thunk for each e, which (I believe) will never be de-duplicated.

Here's my rough guess:

Operations that are quite expensive compared to noDuplicate# should likely use it.
Operations that can produce a substantial loss of sharing in case of duplication should perhaps have versions that call noDuplicate#.

Use noDuplicate# where appropriate #294

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions