Skip to content

Incorrect default MI_TLS_SLOT value #1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JustasMasiulis opened this issue Apr 18, 2025 · 2 comments
Open

Incorrect default MI_TLS_SLOT value #1078

JustasMasiulis opened this issue Apr 18, 2025 · 2 comments

Comments

@JustasMasiulis
Copy link

JustasMasiulis commented Apr 18, 2025

The current MI_TLS_SLOT in v3 uses offset 0x888 when MI_WIN_USE_FIXED_TLS is enabled:

#define MI_TLS_SLOT     (0x888)             // Last user-reserved slot <https://en.wikipedia.org/wiki/Win32_Thread_Information_Block>
// #define MI_TLS_SLOT  (0x1678)            // Last TlsSlot (might clash with other app reserved slot)

static inline void* mi_prim_tls_slot(size_t slot) mi_attr_noexcept {
  #if (_M_X64 || _M_AMD64) && !defined(_M_ARM64EC)
  return (void*)__readgsqword((unsigned long)slot);   // direct load at offset from gs

However I don't see how this makes any sense? That seems to be middle of TEB::Win32ClientInfo or to be more exact afAsyncKeyStateRecentDown field of the tagCLIENTINFO type that Win32ClientInfo represents (at least on older windows versions, because seems like recent win11 touched that specific field). Basically if I'm not mistaken calling GetAsyncKeyState would overwrite the mimalloc TLS "slot"?

@daanx
Copy link
Collaborator

daanx commented May 2, 2025

You may well be right -- the use of static TLS on Windows is still a bit experimental (but it really improves codegen for mi_malloc so it would be nice to make it default). However, we need a fixed slot that is just for mimalloc and I am not sure how to pick the right one -- I chose 0x888 based on the info at https://en.wikipedia.org/wiki/Win32_Thread_Information_Block but clearly that seems wrong. 0x1678 might be a good choice but it may go wrong if a program reserves all user TLS slots :-(. Would you have a suggestion for a better slot?

@res2k
Copy link
Contributor

res2k commented May 2, 2025

Maybe this is a cautionary tale that undocumented OS internals, in fact, do sometimes change 😛
(The Wikipedia articles states the TEB layout was extracted from Wine, and I'm guessing they used an older Windows as a base.
The Vergilius table, on the other hand, claims to be for a quite recent version.)

Strictly speaking, there's no good default index; even if you're able to find an "unused" index, there's no guarantee it doesn't get used in the future, either because an OS component now needs it, or maybe because just stuff happens to shift around.
Or well, I guess there could be a "good" default, if you manage to convince a kernel person to give you an index and the promise to never ever change it. (And even then there might be an issue if two, otherwise independent, mimalloc copies end up in the same process.)

Maybe there shouldn't be a default, rather a "bring your own TLS slot" scheme; this would at least be viable for OS components or drivers that end up mimalloc. And at least app developers that specify some slot to use should (hopefully) be aware of the risks and limitations.

Speaking about codegen - I'm a bit curious, how do the different variants (TEB use, thread_local variable, use API for slot) compare?
My guess would be that they're, overhead-wise, in that order from least to most, but maybe you previously compared them more scientifically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants