Document MaybeUninit bit validity #140463

joshlf · 2025-04-29T14:56:46Z

Partially addresses rust-lang/unsafe-code-guidelines#555 by clarifying that it is sound to write any byte values (initialized or uninitialized) to any MaybeUninit<T> regardless of T.

r? @RalfJung

library/core/src/mem/maybe_uninit.rs

joshlf · 2025-04-29T15:19:13Z

library/core/src/mem/maybe_uninit.rs

@@ -252,6 +252,33 @@ use crate::{fmt, intrinsics, ptr, slice};
 ///     std::process::exit(*code); // UB! Accessing uninitialized memory.
 /// }
 /// ```
+///
+/// # Validity


Moving this discussion here:

The MaybeUninit docs probably make sense for this. We now do have a definition of "byte" in the reference that this can link to.

Okay, awesome. And what wording would you recommend? Would it be accurate to say something like the following?

The value of a [MaybeUninit<u8>; N] may contain pointer provenance, and so p: P -> [MaybeUninit<u8>; N] -> P preserves the value of p, including provenance

@RalfJung would you like me to add language like this to this PR?

Update: I've added the following as a more concrete and fleshed out draft. I can edit or remove as preferred.

/// # Provenance /// /// `MaybeUninit` values may contain [pointer provenance][provenance]. Concretely, for any /// pointer type, `P`, which contains provenance, transmuting `p: P` to /// `MaybeUninit<[u8; size_of::<P>]>` and then back to `P` will produce a value identical to /// `p`, including provenance. /// /// [provenance]: ../ptr/index.html#provenance

RalfJung · 2025-05-07T13:11:41Z

Cc @rust-lang/opsem

RalfJung · 2025-05-07T13:14:13Z

library/core/src/mem/maybe_uninit.rs

+/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these
+/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with
+/// uninitialized bytes in these positions. This is an active area of discussion, and this code
+/// may become sound in the future.


I don't think it makes sense to say that a type "contains initialized bytes" at some offset. That's a property of a representation.

The typical term for representation bytes that are lost here is "padding". I don't think we have rigorously defined padding anywhere yet, but the term is sufficiently widely-used (and generally with a consistent meaning) that we may just be able to use it here?

IIUC, you're making two points:

We should speak about a type's representation containing bytes, not about the type itself containing bytes

In a representation, we should speak about padding bytes rather than uninitialized bytes

Is that right?

One thing that's probably worth distinguishing here is between values and layouts. In my mental model, an uninit byte is one of the possible values that a byte can have (e.g., it's the 257th value that can legally appear in a MaybeUninit<u8>). By contrast, padding is a property of a layout - namely, it's a sequence of bytes in a type's layout that happen to have the validity [MaybeUninit<u8>; PADDING_LEN].

Based on this, maybe it's best to say:

If byte offsets exists at which T's representation does not permit uninitialized bytes but U's representation does (e.g. due to padding), then the bytes in T at these offsets may not be preserved in u, and so transmute(u) may produce a T with uninitialized bytes at these offsets. This is an active area of discussion, and this code may become sound in the future.

Is that right?

No. I think both of the following concepts make sense:

The representation of a particular value at a particular type contains uninitialized bytes.

A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)

But it makes less sense to talk about padding of a representation, or to talk about uninitialized bytes in a type.

So for this PR, the two key points (and they are separate points) are:

If U has padding, those bytes may be reset to "uninitialized" as part of the round-trip. If those same bytes are not padding in T, this can therefore mean some of the information of the original T value is lost.

If T does not permit uninitialized bytes on those positions, the round-trip is UB.

The second point is just a logical consequence of the first, it does not add any new information. Not sure if it is worth mentioning.

The representation of a particular value at a particular type contains uninitialized bytes.

A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)

Does this imply that a type contains padding bytes, not a type's representation?

I'm thinking through the implications of what you said, and I think I understand something new that I didn't before, and I want to run it by you: In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid (enums complicate this model, but I don't think that complication is relevant for this discussion - we can just stick to thinking about structs). The problem with this mental model is that, interpreted naively, it implies that different byte values in a padding byte could correspond to different logical values of the type. So e.g. in the type #[repr(C)] struct T(u8, u16), [0, 0, 0, 0] and [0, 1, 0, 0] would correspond to different values of the type since we're treating the padding byte itself as part of the representation relation. Of course, that is not something we want.

IIUC, by contrast your model is that the representation relation simply doesn't include padding bytes at all. So it'd be more accurate to describe the representation of T as consisting of three bytes - at offsets 0, 2, and 3. Every representation of T has a "hole" at offset 1 which is not part of the representation. This ensures that there's a 1:1 mapping between logical values and representations. Is that right?

Does this imply that a type contains padding bytes, not a type's representation?

That's how I think about it. We can't tell which byte is a padding byte by looking at one representation -- it's a property of the type.

In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid

That would make the only byte of MaybeUninit<u8> a padding byte, so I don't think this is the right definition.
That's why I said above: a padding byte is a byte that is ignored by the representation relation. Slightly more formally: if r is some representation valid for type T, and r' is equal to r everywhere except for padding bytes, then r and r' represent the same value.

So it'd be more accurate to describe the representation of T as consisting of three bytes

The representation has 4 bytes. But only 3 of them actually affect the represented value (which is a tuple of two [mathematical] integers).

We seem to be using the term "representation" slightly differently. For me, that's list a List<Byte> of appropriate length. You may be using that term to refer to what I call "representation relation"?

library/core/src/mem/maybe_uninit.rs

RalfJung · 2025-05-23T13:49:33Z

@rustbot ready

ia0 · 2025-05-23T14:15:33Z

library/core/src/mem/maybe_uninit.rs

+
+/// If byte offsets exists at which `T`'s representation does not permit uninitialized bytes but
+/// `U`'s representation does (e.g. due to padding), then the bytes in `T` at these offsets may
+/// not be preserved in `u`, and so `transmute(u)` may produce a `T` with uninitialized bytes at
+/// these offsets. This is an active area of discussion, and this code may become sound in the future.


Doesn't this repeat the above? I guess it's a left-over because there's also an empty line instead of ///.

ia0 · 2025-05-23T14:15:37Z

library/core/src/mem/maybe_uninit.rs

+///
+/// `MaybeUninit` values may contain [pointer provenance][provenance]. Concretely, for any
+/// value, `p: P`, which contains provenance, transmuting `p` to `MaybeUninit<[u8; size_of::<P>]>`
+/// and then back to `P` will produce a value identical to `p`, including provenance.


Doesn't this either contradict the above or assume that P does not have padding bytes? (P could be a struct containing pointers for example)

I don't see a conflict? Going from P to an array of MaybeUninit<u8> and back is fine.

Going from an array of MaybeUninit<u8> to P and back may lose data, but that's not what the text says.

Oh right, my bad. Yes that makes sense.

RalfJung · 2025-05-23T16:38:07Z

library/core/src/mem/maybe_uninit.rs

+///
+/// A `MaybeUninit<T>` has no validity requirement – any sequence of bytes of the appropriate length,
+/// initialized to any value or uninitialized, are a valid value of `MaybeUninit<T>`. Equivalently,
+/// it is always sound to perform `transmute::<[MaybeUninit<u8>; size_of::<T>()], MaybeUninit<T>>(...)`.


That second sentence is odd, I don't understand why this particular transmute illustrates the first sentence or why it is something we'd want to mention here?

RalfJung · 2025-05-23T16:38:47Z

library/core/src/mem/maybe_uninit.rs

+/// initialized to any value or uninitialized, are a valid value of `MaybeUninit<T>`. Equivalently,
+/// it is always sound to perform `transmute::<[MaybeUninit<u8>; size_of::<T>()], MaybeUninit<T>>(...)`.
+///
+/// Note that "round-tripping" via `MaybeUninit` does not always result in the original value.


Suggested change

/// Note that "round-tripping" via `MaybeUninit` does not always result in the original value.

/// However, "round-tripping" via `MaybeUninit` does not always result in the original value.

RalfJung · 2025-05-23T16:40:03Z

library/core/src/mem/maybe_uninit.rs

+/// }
+/// ```
+///
+/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these


Suggested change

/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these

/// If the representation of `t` contains initialized bytes at byte offsets where `U` contains padding bytes, these

I don't know if we use "representation" in official docs already, but I also don't know how to document this here without using that term.

RalfJung · 2025-05-23T16:42:58Z

library/core/src/mem/maybe_uninit.rs

+/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with
+/// uninitialized bytes in these positions. This is an active area of discussion, and this code


Suggested change

/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with

/// uninitialized bytes in these positions. This is an active area of discussion, and this code

/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` at type `T` again (i.e., `transmute(u)` above) may thus

/// be undefined behavior or yield a value different from `t` due to those bytes being lost. This is an active area of discussion, and this code

RalfJung · 2025-05-23T16:47:30Z

library/core/src/mem/maybe_uninit.rs

+///
+/// Note that, so long as no such byte offsets exist, then the preceding `identity` example *is* sound.
+///
+/// # Provenance


Should we really discuss provenance separately?

The way I'd structure this is I'd first explain that MaybeUninit has no validity invariant, ergo it can hold arbitrary data, ergo T → [MaybeUninit<u8>; size_of::<T>()] → T is a valid round-trip, including if T has padding or provenance. And then we can talk about the other round-trip which is not legal. "Uninit" and "provenance" can potentially be discussed as special cases if that's useful, but it should be clear that there's a single underlying principle here.

library/core/src/mem/maybe_uninit.rs

RalfJung · 2025-05-30T12:20:32Z

@rustbot author

rustbot · 2025-05-30T12:20:36Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

RalfJung · 2025-06-01T11:58:10Z

I raised the question on Zulip whether it is wise to make a guarantee here that isn't, strictly speaking, documented in the LLVM LangRef. Nikita says he thinks that that's fine -- we may have to adjust how exactly we compile MaybeUninit in the future, but LLVM currently intends do support this case in a somewhat roundabout and incomplete way that seems to work well enough in practice, and LLVM can't more aggressively exploit the fuzziness along the edges of that approximation until a proper alternative exists.

Document MaybeUninit bit validity

0792da3

rustbot assigned RalfJung Apr 29, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 29, 2025

joshlf commented Apr 29, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Show resolved Hide resolved

joshlf mentioned this pull request Apr 29, 2025

Do typed copies of unions preserve "invalid" bytes? rust-lang/unsafe-code-guidelines#555

Closed

Clarify that round-tripping is sound so long as bytes are initialized

21626ac

RalfJung mentioned this pull request Apr 29, 2025

What about: Pointer-to-integer transmutes? rust-lang/unsafe-code-guidelines#286

Open

joshlf commented Apr 29, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

joshlf added 2 commits April 29, 2025 08:39

Add unsafe block to example

1494ec7

Document provenance

05e6a34

RalfJung reviewed May 7, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

joshlf added 2 commits May 7, 2025 10:31

Clarify provenance

8d3a47e

Clarify validity regarding initialization

75380ea

ia0 reviewed May 23, 2025

View reviewed changes

RalfJung reviewed May 23, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 30, 2025

	/// Note that "round-tripping" via `MaybeUninit` does not always result in the original value.
	/// However, "round-tripping" via `MaybeUninit` does not always result in the original value.

	/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these
	/// If the representation of `t` contains initialized bytes at byte offsets where `U` contains padding bytes, these

		/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with
		/// uninitialized bytes in these positions. This is an active area of discussion, and this code

Document MaybeUninit bit validity #140463

Are you sure you want to change the base?

Document MaybeUninit bit validity #140463

Uh oh!

Conversation

joshlf commented Apr 29, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

RalfJung commented May 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RalfJung commented May 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RalfJung commented May 30, 2025

Uh oh!

rustbot commented May 30, 2025

Uh oh!

RalfJung commented Jun 1, 2025

Uh oh!

Uh oh!

RalfJung May 8, 2025 •

edited

Loading

RalfJung May 8, 2025 •

edited

Loading

RalfJung May 23, 2025 •

edited

Loading