Skip to content

Commit e579447

Browse files
committed
std: refactor Windows TLS destructor support
Now that the fallback code has been removed, there are no users of `StaticKey` left for targets with native TLS. Therefore, move the at-exit hack to the thread-local guard module, where it can be shared by both implementations, and cfg-out the key-based TLS when it's not needed.
1 parent 59a3c1f commit e579447

File tree

6 files changed

+129
-160
lines changed

6 files changed

+129
-160
lines changed

library/std/src/sys/windows/c.rs

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ pub const EXIT_FAILURE: u32 = 1;
5252

5353
pub const CONDITION_VARIABLE_INIT: CONDITION_VARIABLE = CONDITION_VARIABLE { Ptr: ptr::null_mut() };
5454
pub const SRWLOCK_INIT: SRWLOCK = SRWLOCK { Ptr: ptr::null_mut() };
55+
#[cfg(not(target_thread_local))] // Only used by key-based TLS.
5556
pub const INIT_ONCE_STATIC_INIT: INIT_ONCE = INIT_ONCE { Ptr: ptr::null_mut() };
5657

5758
// Some windows_sys types have different signs than the types we use.

library/std/src/sys/windows/mod.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ pub mod process;
3131
pub mod rand;
3232
pub mod stdio;
3333
pub mod thread;
34-
pub mod thread_local_dtor;
34+
pub mod thread_local_guard;
3535
pub mod thread_local_key;
3636
pub mod thread_parking;
3737
pub mod time;

library/std/src/sys/windows/thread_local_dtor.rs

-7
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
//! A TLS destructor system.
2+
//!
3+
//! Turns out, like pretty much everything, Windows is pretty close the
4+
//! functionality that Unix provides, but slightly different! In the case of
5+
//! TLS, Windows does not provide an API to provide a destructor for a TLS
6+
//! variable. This ends up being pretty crucial to this implementation, so we
7+
//! need a way around this.
8+
//!
9+
//! The solution here ended up being a little obscure, but fear not, the
10+
//! internet has informed me [1][2] that this solution is not unique (no way
11+
//! I could have thought of it as well!). The key idea is to insert some hook
12+
//! somewhere to run arbitrary code on thread termination. With this in place
13+
//! we'll be able to run anything we like, including all TLS destructors!
14+
//!
15+
//! If you're looking at this code, and wondering "what is this doing?",
16+
//! you're not alone! I'll try to break this down step by step:
17+
//!
18+
//! # What's up with CRT$XLB?
19+
//!
20+
//! For anything about TLS destructors to work on Windows, we have to be able
21+
//! to run *something* when a thread exits. To do so, we place a very special
22+
//! static in a very special location. If this is encoded in just the right
23+
//! way, the kernel's loader is apparently nice enough to run some function
24+
//! of ours whenever a thread exits! How nice of the kernel!
25+
//!
26+
//! Lots of detailed information can be found in source [1] above, but the
27+
//! gist of it is that this is leveraging a feature of Microsoft's PE format
28+
//! (executable format) which is not actually used by any compilers today.
29+
//! This apparently translates to any callbacks in the ".CRT$XLB" section
30+
//! being run on certain events.
31+
//!
32+
//! So after all that, we use the compiler's #[link_section] feature to place
33+
//! a callback pointer into the magic section so it ends up being called.
34+
//!
35+
//! # What's up with this callback?
36+
//!
37+
//! The callback specified receives a number of parameters from... someone!
38+
//! (the kernel? the runtime? I'm not quite sure!) There are a few events that
39+
//! this gets invoked for, but we're currently only interested on when a
40+
//! thread or a process "detaches" (exits). The process part happens for the
41+
//! last thread and the thread part happens for any normal thread.
42+
//!
43+
//! # The article mentions weird stuff about "/INCLUDE"?
44+
//!
45+
//! It sure does! Specifically we're talking about this quote:
46+
//!
47+
//! > The Microsoft run-time library facilitates this process by defining a
48+
//! > memory image of the TLS Directory and giving it the special name
49+
//! > “__tls_used” (Intel x86 platforms) or “_tls_used” (other platforms). The
50+
//! > linker looks for this memory image and uses the data there to create the
51+
//! > TLS Directory. Other compilers that support TLS and work with the
52+
//! > Microsoft linker must use this same technique.
53+
//!
54+
//! Basically what this means is that if we want support for our TLS
55+
//! destructors/our hook being called then we need to make sure the linker does
56+
//! not omit this symbol. Otherwise it will omit it and our callback won't be
57+
//! wired up.
58+
//!
59+
//! We don't actually use the `/INCLUDE` linker flag here like the article
60+
//! mentions because the Rust compiler doesn't propagate linker flags, but
61+
//! instead we use a shim function which performs a volatile 1-byte load from
62+
//! the address of the symbol to ensure it sticks around.
63+
//!
64+
//! [1]: https://www.codeproject.com/Articles/8113/Thread-Local-Storage-The-C-Way
65+
//! [2]: https://github.com/ChromiumWebApps/chromium/blob/master/base/threading/thread_local_storage_win.cc#L42
66+
67+
#![unstable(feature = "thread_local_internals", issue = "none")]
68+
69+
use crate::ptr;
70+
use crate::sync::atomic::{
71+
AtomicBool,
72+
Ordering::{Acquire, Relaxed},
73+
};
74+
use crate::sys::c;
75+
76+
// If the target uses native TLS, run its destructors.
77+
#[cfg(target_thread_local)]
78+
use crate::sys::common::thread_local::run_dtors;
79+
// Otherwise, run the destructors for the key-based variant.
80+
#[cfg(not(target_thread_local))]
81+
use super::thread_local_key::run_dtors;
82+
83+
/// An optimization hint. The compiler is often smart enough to know if an atomic
84+
/// is never set and can remove dead code based on that fact.
85+
static HAS_DTORS: AtomicBool = AtomicBool::new(false);
86+
87+
/// Ensure that thread-locals are destroyed when the thread exits.
88+
pub fn activate() {
89+
HAS_DTORS.store(true, Relaxed);
90+
}
91+
92+
#[link_section = ".CRT$XLB"]
93+
#[allow(dead_code, unused_variables)]
94+
#[used] // we don't want LLVM eliminating this symbol for any reason, and
95+
// when the symbol makes it to the linker the linker will take over
96+
pub static p_thread_callback: unsafe extern "system" fn(c::LPVOID, c::DWORD, c::LPVOID) =
97+
on_tls_callback;
98+
99+
#[allow(dead_code, unused_variables)]
100+
unsafe extern "system" fn on_tls_callback(h: c::LPVOID, dwReason: c::DWORD, pv: c::LPVOID) {
101+
if !HAS_DTORS.load(Acquire) {
102+
return;
103+
}
104+
if dwReason == c::DLL_THREAD_DETACH || dwReason == c::DLL_PROCESS_DETACH {
105+
run_dtors(ptr::null_mut());
106+
}
107+
108+
// See comments above for what this is doing. Note that we don't need this
109+
// trickery on GNU windows, just on MSVC.
110+
reference_tls_used();
111+
#[cfg(target_env = "msvc")]
112+
unsafe fn reference_tls_used() {
113+
extern "C" {
114+
static _tls_used: u8;
115+
}
116+
crate::intrinsics::volatile_load(&_tls_used);
117+
}
118+
#[cfg(not(target_env = "msvc"))]
119+
unsafe fn reference_tls_used() {}
120+
}

library/std/src/sys/windows/thread_local_key.rs

+6-152
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,19 @@
1+
#![cfg(not(target_thread_local))]
2+
13
use crate::cell::UnsafeCell;
24
use crate::ptr;
35
use crate::sync::atomic::{
4-
AtomicBool, AtomicPtr, AtomicU32,
6+
AtomicPtr, AtomicU32,
57
Ordering::{AcqRel, Acquire, Relaxed, Release},
68
};
79
use crate::sys::c;
810

911
#[cfg(test)]
1012
mod tests;
1113

12-
/// An optimization hint. The compiler is often smart enough to know if an atomic
13-
/// is never set and can remove dead code based on that fact.
14-
static HAS_DTORS: AtomicBool = AtomicBool::new(false);
15-
16-
// Using a per-thread list avoids the problems in synchronizing global state.
17-
#[thread_local]
18-
#[cfg(target_thread_local)]
19-
static mut DESTRUCTORS: Vec<(*mut u8, unsafe extern "C" fn(*mut u8))> = Vec::new();
20-
21-
// Ensure this can never be inlined because otherwise this may break in dylibs.
22-
// See #44391.
23-
#[inline(never)]
24-
#[cfg(target_thread_local)]
25-
pub unsafe fn register_keyless_dtor(t: *mut u8, dtor: unsafe extern "C" fn(*mut u8)) {
26-
DESTRUCTORS.push((t, dtor));
27-
HAS_DTORS.store(true, Relaxed);
28-
}
29-
30-
#[inline(never)] // See comment above
31-
#[cfg(target_thread_local)]
32-
/// Runs destructors. This should not be called until thread exit.
33-
unsafe fn run_keyless_dtors() {
34-
// Drop all the destructors.
35-
//
36-
// Note: While this is potentially an infinite loop, it *should* be
37-
// the case that this loop always terminates because we provide the
38-
// guarantee that a TLS key cannot be set after it is flagged for
39-
// destruction.
40-
while let Some((ptr, dtor)) = DESTRUCTORS.pop() {
41-
(dtor)(ptr);
42-
}
43-
// We're done so free the memory.
44-
DESTRUCTORS = Vec::new();
45-
}
46-
4714
type Key = c::DWORD;
4815
type Dtor = unsafe extern "C" fn(*mut u8);
4916

50-
// Turns out, like pretty much everything, Windows is pretty close the
51-
// functionality that Unix provides, but slightly different! In the case of
52-
// TLS, Windows does not provide an API to provide a destructor for a TLS
53-
// variable. This ends up being pretty crucial to this implementation, so we
54-
// need a way around this.
55-
//
56-
// The solution here ended up being a little obscure, but fear not, the
57-
// internet has informed me [1][2] that this solution is not unique (no way
58-
// I could have thought of it as well!). The key idea is to insert some hook
59-
// somewhere to run arbitrary code on thread termination. With this in place
60-
// we'll be able to run anything we like, including all TLS destructors!
61-
//
62-
// To accomplish this feat, we perform a number of threads, all contained
63-
// within this module:
64-
//
65-
// * All TLS destructors are tracked by *us*, not the Windows runtime. This
66-
// means that we have a global list of destructors for each TLS key that
67-
// we know about.
68-
// * When a thread exits, we run over the entire list and run dtors for all
69-
// non-null keys. This attempts to match Unix semantics in this regard.
70-
//
71-
// For more details and nitty-gritty, see the code sections below!
72-
//
73-
// [1]: https://www.codeproject.com/Articles/8113/Thread-Local-Storage-The-C-Way
74-
// [2]: https://github.com/ChromiumWebApps/chromium/blob/master/base/threading/thread_local_storage_win.cc#L42
75-
7617
pub struct StaticKey {
7718
/// The key value shifted up by one. Since TLS_OUT_OF_INDEXES == DWORD::MAX
7819
/// is not a valid key value, this allows us to use zero as sentinel value
@@ -204,41 +145,10 @@ unsafe fn register_dtor(key: &'static StaticKey) {
204145
Err(new) => head = new,
205146
}
206147
}
207-
HAS_DTORS.store(true, Release);
148+
super::thread_local_guard::activate();
208149
}
209150

210-
// -------------------------------------------------------------------------
211-
// Where the Magic (TM) Happens
212-
//
213-
// If you're looking at this code, and wondering "what is this doing?",
214-
// you're not alone! I'll try to break this down step by step:
215-
//
216-
// # What's up with CRT$XLB?
217-
//
218-
// For anything about TLS destructors to work on Windows, we have to be able
219-
// to run *something* when a thread exits. To do so, we place a very special
220-
// static in a very special location. If this is encoded in just the right
221-
// way, the kernel's loader is apparently nice enough to run some function
222-
// of ours whenever a thread exits! How nice of the kernel!
223-
//
224-
// Lots of detailed information can be found in source [1] above, but the
225-
// gist of it is that this is leveraging a feature of Microsoft's PE format
226-
// (executable format) which is not actually used by any compilers today.
227-
// This apparently translates to any callbacks in the ".CRT$XLB" section
228-
// being run on certain events.
229-
//
230-
// So after all that, we use the compiler's #[link_section] feature to place
231-
// a callback pointer into the magic section so it ends up being called.
232-
//
233-
// # What's up with this callback?
234-
//
235-
// The callback specified receives a number of parameters from... someone!
236-
// (the kernel? the runtime? I'm not quite sure!) There are a few events that
237-
// this gets invoked for, but we're currently only interested on when a
238-
// thread or a process "detaches" (exits). The process part happens for the
239-
// last thread and the thread part happens for any normal thread.
240-
//
241-
// # Ok, what's up with running all these destructors?
151+
// What's up with running all these destructors?
242152
//
243153
// This will likely need to be improved over time, but this function
244154
// attempts a "poor man's" destructor callback system. Once we've got a list
@@ -247,63 +157,7 @@ unsafe fn register_dtor(key: &'static StaticKey) {
247157
// beforehand). We do this a few times in a loop to basically match Unix
248158
// semantics. If we don't reach a fixed point after a short while then we just
249159
// inevitably leak something most likely.
250-
//
251-
// # The article mentions weird stuff about "/INCLUDE"?
252-
//
253-
// It sure does! Specifically we're talking about this quote:
254-
//
255-
// The Microsoft run-time library facilitates this process by defining a
256-
// memory image of the TLS Directory and giving it the special name
257-
// “__tls_used” (Intel x86 platforms) or “_tls_used” (other platforms). The
258-
// linker looks for this memory image and uses the data there to create the
259-
// TLS Directory. Other compilers that support TLS and work with the
260-
// Microsoft linker must use this same technique.
261-
//
262-
// Basically what this means is that if we want support for our TLS
263-
// destructors/our hook being called then we need to make sure the linker does
264-
// not omit this symbol. Otherwise it will omit it and our callback won't be
265-
// wired up.
266-
//
267-
// We don't actually use the `/INCLUDE` linker flag here like the article
268-
// mentions because the Rust compiler doesn't propagate linker flags, but
269-
// instead we use a shim function which performs a volatile 1-byte load from
270-
// the address of the symbol to ensure it sticks around.
271-
272-
#[link_section = ".CRT$XLB"]
273-
#[allow(dead_code, unused_variables)]
274-
#[used] // we don't want LLVM eliminating this symbol for any reason, and
275-
// when the symbol makes it to the linker the linker will take over
276-
pub static p_thread_callback: unsafe extern "system" fn(c::LPVOID, c::DWORD, c::LPVOID) =
277-
on_tls_callback;
278-
279-
#[allow(dead_code, unused_variables)]
280-
unsafe extern "system" fn on_tls_callback(h: c::LPVOID, dwReason: c::DWORD, pv: c::LPVOID) {
281-
if !HAS_DTORS.load(Acquire) {
282-
return;
283-
}
284-
if dwReason == c::DLL_THREAD_DETACH || dwReason == c::DLL_PROCESS_DETACH {
285-
#[cfg(not(target_thread_local))]
286-
run_dtors();
287-
#[cfg(target_thread_local)]
288-
run_keyless_dtors();
289-
}
290-
291-
// See comments above for what this is doing. Note that we don't need this
292-
// trickery on GNU windows, just on MSVC.
293-
reference_tls_used();
294-
#[cfg(target_env = "msvc")]
295-
unsafe fn reference_tls_used() {
296-
extern "C" {
297-
static _tls_used: u8;
298-
}
299-
crate::intrinsics::volatile_load(&_tls_used);
300-
}
301-
#[cfg(not(target_env = "msvc"))]
302-
unsafe fn reference_tls_used() {}
303-
}
304-
305-
#[allow(dead_code)] // actually called below
306-
unsafe fn run_dtors() {
160+
pub(super) unsafe fn run_dtors(_ptr: *mut u8) {
307161
for _ in 0..5 {
308162
let mut any_run = false;
309163

library/std/src/sys_common/mod.rs

+1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ pub mod wtf8;
3535

3636
cfg_if::cfg_if! {
3737
if #[cfg(target_os = "windows")] {
38+
#[cfg(not(target_thread_local))]
3839
pub use crate::sys::thread_local_key;
3940
} else {
4041
pub mod thread_local_key;

0 commit comments

Comments
 (0)