diff --git a/.travis.yml b/.travis.yml index d7f8f8f42..55ea20587 100644 --- a/.travis.yml +++ b/.travis.yml @@ -3,7 +3,7 @@ cache: - cargo before_install: - shopt -s globstar -- MAX_LINE_LENGTH=80 bash ci/check_line_lengths.sh src/**/*.md +- MAX_LINE_LENGTH=100 bash ci/check_line_lengths.sh src/**/*.md install: - source ~/.cargo/env || true - bash ci/install.sh diff --git a/ci/check_line_lengths.sh b/ci/check_line_lengths.sh index 91f199b7e..5b7b12d3e 100755 --- a/ci/check_line_lengths.sh +++ b/ci/check_line_lengths.sh @@ -2,7 +2,7 @@ if [ "$1" == "--help" ]; then echo 'Usage:' - echo ' MAX_LINE_LENGTH=80' "$0" 'src/**/*.md' + echo ' MAX_LINE_LENGTH=100' "$0" 'src/**/*.md' exit 1 fi diff --git a/src/SUMMARY.md b/src/SUMMARY.md index ad8a82623..338cb7fe1 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -53,9 +53,12 @@ - [MIR construction](./mir/construction.md) - [MIR visitor and traversal](./mir/visitor.md) - [MIR passes: getting the MIR for a function](./mir/passes.md) - - [MIR borrowck](./mir/borrowck.md) - - [MIR-based region checking (NLL)](./mir/regionck.md) - [MIR optimizations](./mir/optimizations.md) +- [The borrow checker](./borrow_check.md) + - [Tracking moves and initialization](./borrow_check/moves_and_initialization.md) + - [Move paths](./borrow_check/moves_and_initialization/move_paths.md) + - [MIR type checker](./borrow_check/type_check.md) + - [Region inference](./borrow_check/region_inference.md) - [Constant evaluation](./const-eval.md) - [miri const evaluator](./miri.md) - [Parameter Environments](./param_env.md) diff --git a/src/appendix/glossary.md b/src/appendix/glossary.md index bfc2c0d22..42315536a 100644 --- a/src/appendix/glossary.md +++ b/src/appendix/glossary.md @@ -40,7 +40,7 @@ MIR | the Mid-level IR that is created after type-checking miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html)) normalize | a general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](./traits/associated-types.html#normalize) newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices. -NLL | [non-lexical lifetimes](./mir/regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph. +NLL | [non-lexical lifetimes](./borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph. node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`. obligation | something that must be proven by the trait system ([see more](traits/resolution.html)) projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits/goals-and-clauses.html#trait-ref) @@ -53,7 +53,7 @@ rib | a data structure in the name resolver that keeps trac sess | the compiler session, which stores global data used throughout compilation side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node. sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references. -skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir/regionck.html#skol) for more details. +skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./borrow_check/region_inference.html#skol) for more details. soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness"). span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more. substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap`) diff --git a/src/borrow_check.md b/src/borrow_check.md new file mode 100644 index 000000000..40858b1b4 --- /dev/null +++ b/src/borrow_check.md @@ -0,0 +1,63 @@ +# MIR borrow check + +The borrow check is Rust's "secret sauce" – it is tasked with +enforcing a number of properties: + +- That all variables are initialized before they are used. +- That you can't move the same value twice. +- That you can't move a value while it is borrowed. +- That you can't access a place while it is mutably borrowed (except through + the reference). +- That you can't mutate a place while it is shared borrowed. +- etc + +At the time of this writing, the code is in a state of transition. The +"main" borrow checker still works by processing [the HIR](hir.html), +but that is being phased out in favor of the MIR-based borrow checker. +Accordingly, this documentation focuses on the new, MIR-based borrow +checker. + +Doing borrow checking on MIR has several advantages: + +- The MIR is *far* less complex than the HIR; the radical desugaring + helps prevent bugs in the borrow checker. (If you're curious, you + can see + [a list of bugs that the MIR-based borrow checker fixes here][47366].) +- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll], + which are regions derived from the control-flow graph. + +[47366]: https://github.com/rust-lang/rust/issues/47366 +[nll]: http://rust-lang.github.io/rfcs/2094-nll.html + +### Major phases of the borrow checker + +The borrow checker source is found in +[the `rustc_mir::borrow_check` module][b_c]. The main entry point is +the [`mir_borrowck`] query. + +[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html +[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html + +- We first create a **local copy** of the MIR. In the coming steps, + we will modify this copy in place to modify the types and things to + include references to the new regions that we are computing. +- We then invoke [`replace_regions_in_mir`] to modify our local MIR. + Among other things, this function will replace all of the [regions](./appendix/glossary.html) in + the MIR with fresh [inference variables](./appendix/glossary.html). +- Next, we perform a number of + [dataflow analyses](./appendix/background.html#dataflow) that + compute what data is moved and when. +- We then do a [second type check](borrow_check/type_check.html) across the MIR: + the purpose of this type check is to determine all of the constraints between + different regions. +- Next, we do [region inference](borrow_check/region_inference.html), which computes + the values of each region — basically, points in the control-flow graph. +- At this point, we can compute the "borrows in scope" at each point. +- Finally, we do a second walk over the MIR, looking at the actions it + does and reporting errors. For example, if we see a statement like + `*a + 1`, then we would check that the variable `a` is initialized + and that it is not mutably borrowed, as either of those would + require an error to be reported. + - Doing this check requires the results of all the previous analyses. + +[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html diff --git a/src/borrow_check/moves_and_initialization.md b/src/borrow_check/moves_and_initialization.md new file mode 100644 index 000000000..d1cd41e0f --- /dev/null +++ b/src/borrow_check/moves_and_initialization.md @@ -0,0 +1,50 @@ +# Tracking moves and initialization + +Part of the borrow checker's job is to track which variables are +"initialized" at any given point in time -- this also requires +figuring out where moves occur and tracking those. + +## Initialization and moves + +From a user's perspective, initialization -- giving a variable some +value -- and moves -- transfering ownership to another place -- might +seem like distinct topics. Indeed, our borrow checker error messages +often talk about them differently. But **within the borrow checker**, +they are not nearly as separate. Roughly speaking, the borrow checker +tracks the set of "initialized places" at any point in the source +code. Assigning to a previously uninitialized local variable adds it +to that set; moving from a local variable removes it from that set. + +Consider this example: + +```rust,ignore +fn foo() { + let a: Vec; + + // a is not initialized yet + + a = vec![22]; + + // a is initialized here + + std::mem::drop(a); // a is moved here + + // a is no longer initialized here + + let l = a.len(); //~ ERROR +} +``` + +Here you can see that `a` starts off as uninitialized; once it is +assigned, it becomes initialized. But when `drop(a)` is called, that +moves `a` into the call, and hence it becomes uninitialized again. + +## Subsections + +To make it easier to peruse, this section is broken into a number of +subsections: + +- [Move paths](./moves_and_initialization/move_paths.html the + *move path* concept that we use to track which local variables (or parts of + local variables, in some cases) are initialized. +- TODO *Rest not yet written* =) diff --git a/src/borrow_check/moves_and_initialization/move_paths.md b/src/borrow_check/moves_and_initialization/move_paths.md new file mode 100644 index 000000000..c9e22a81c --- /dev/null +++ b/src/borrow_check/moves_and_initialization/move_paths.md @@ -0,0 +1,128 @@ +# Move paths + +In reality, it's not enough to track initialization at the granularity +of local variables. Rust also allows us to do moves and initialization +at the field granularity: + +```rust,ignore +fn foo() { + let a: (Vec, Vec) = (vec![22], vec![44]); + + // a.0 and a.1 are both initialized + + let b = a.0; // moves a.0 + + // a.0 is not initializd, but a.1 still is + + let c = a.0; // ERROR + let d = a.1; // OK +} +``` + +To handle this, we track initialization at the granularity of a **move +path**. A [`MovePath`] represents some location that the user can +initialize, move, etc. So e.g. there is a move-path representing the +local variable `a`, and there is a move-path representing `a.0`. Move +paths roughly correspond to the concept of a [`Place`] from MIR, but +they are indexed in ways that enable us to do move analysis more +efficiently. + +[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html +[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html + +## Move path indices + +Although there is a [`MovePath`] data structure, they are never +referenced directly. Instead, all the code passes around *indices* of +type +[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If +you need to get information about a move path, you use this index with +the [`move_paths` field of the `MoveData`][move_paths]. For example, +to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might +access the [`MovePath::place`] field like so: + +```rust,ignore +move_data.move_paths[mpi].place +``` + +[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths +[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place + +## Building move paths + +One of the first things we do in the MIR borrow check is to construct +the set of move paths. This is done as part of the +[`MoveData::gather_moves`] function. This function uses a MIR visitor +called [`Gatherer`] to walk the MIR and look at how each [`Place`] +within is accessed. For each such [`Place`], it constructs a +corresponding [`MovePathIndex`]. It also records when/where that +particular move path is moved/initialized, but we'll get to that in a +later section. + +[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html +[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves + +### Illegal move paths + +We don't actually create a move-path for **every** [`Place`] that gets +used. In particular, if it is illegal to move from a [`Place`], then +there is no need for a [`MovePathIndex`]. Some examples: + +- You cannot move from a static variable, so we do not create a [`MovePathIndex`] + for static variables. +- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`, + there would be no move-path for `foo[1]`. +- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`, + there would be no move-path for `*foo`. + +These rules are enforced by the [`move_path_for`] function, which +converts a [`Place`] into a [`MovePathIndex`] -- in error cases like +those just discussed, the function returns an `Err`. This in turn +means we don't have to bother tracking whether those places are +initialized (which lowers overhead). + +[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for + +## Looking up a move-path + +If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you +can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field +of [`MoveData`]. There are two different methods: + +[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html +[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup + +- [`find_local`], which takes a [`mir::Local`] representing a local + variable. This is the easier method, because we **always** create a + [`MovePathIndex`] for every local variable. +- [`find`], which takes an arbitrary [`Place`]. This method is a bit + more annoying to use, precisely because we don't have a + [`MovePathIndex`] for **every** [`Place`] (as we just discussed in + the "illegal move paths" section). Therefore, [`find`] returns a + [`LookupResult`] indicating the closest path it was able to find + that exists (e.g., for `foo[1]`, it might return just the path for + `foo`). + +[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find +[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local +[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html +[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html + +## Cross-references + +As we noted above, move-paths are stored in a big vector and +referenced via their [`MovePathIndex`]. However, within this vector, +they are also structured into a tree. So for example if you have the +[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path +`a.b`. You can also iterate over all children paths: so, from `a.b`, +you might iterate to find the path `a.b.c` (here you are iterating +just over the paths that are **actually referenced** in the source, +not all **possible** paths that could have been referenced). These +references are used for example in the [`has_any_child_of`] function, +which checks whether the dataflow results contain a value for the +given move-path (e.g., `a.b`) or any child of that move-path (e.g., +`a.b.c`). + +[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html +[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of + diff --git a/src/mir/regionck.md b/src/borrow_check/region_inference.md similarity index 99% rename from src/mir/regionck.md rename to src/borrow_check/region_inference.md index 9034af8a8..47b21b0d2 100644 --- a/src/mir/regionck.md +++ b/src/borrow_check/region_inference.md @@ -1,11 +1,11 @@ -# MIR-based region checking (NLL) +# Region inference (NLL) The MIR-based region checking code is located in [the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course, stands for "non-lexical lifetimes", a term that will hopefully be deprecated once they become the standard kind of lifetime.) -[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll +[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html The MIR-based region analysis consists of two major functions: diff --git a/src/borrow_check/type_check.md b/src/borrow_check/type_check.md new file mode 100644 index 000000000..ee955d971 --- /dev/null +++ b/src/borrow_check/type_check.md @@ -0,0 +1,10 @@ +# The MIR type-check + +A key component of the borrow check is the +[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html). +This check walks the MIR and does a complete "type check" -- the same +kind you might find in any other language. In the process of doing +this type-check, we also uncover the region constraints that apply to +the program. + +TODO -- elaborate further? Maybe? :) diff --git a/src/mir/borrowck.md b/src/mir/borrowck.md deleted file mode 100644 index d5fea6184..000000000 --- a/src/mir/borrowck.md +++ /dev/null @@ -1,59 +0,0 @@ -# MIR borrow check - -The borrow check is Rust's "secret sauce" – it is tasked with -enforcing a number of properties: - -- That all variables are initialized before they are used. -- That you can't move the same value twice. -- That you can't move a value while it is borrowed. -- That you can't access a place while it is mutably borrowed (except through - the reference). -- That you can't mutate a place while it is shared borrowed. -- etc - -At the time of this writing, the code is in a state of transition. The -"main" borrow checker still works by processing [the HIR](hir.html), -but that is being phased out in favor of the MIR-based borrow checker. -Doing borrow checking on MIR has two key advantages: - -- The MIR is *far* less complex than the HIR; the radical desugaring - helps prevent bugs in the borrow checker. (If you're curious, you - can see - [a list of bugs that the MIR-based borrow checker fixes here][47366].) -- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll], - which are regions derived from the control-flow graph. - -[47366]: https://github.com/rust-lang/rust/issues/47366 -[nll]: http://rust-lang.github.io/rfcs/2094-nll.html - -### Major phases of the borrow checker - -The borrow checker source is found in -[the `rustc_mir::borrow_check` module][b_c]. The main entry point is -the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate -in several modes, but this text will describe only the mode when NLL is enabled -(what you get with `#![feature(nll)]`). - -[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check - -The overall flow of the borrow checker is as follows: - -- We first create a **local copy** C of the MIR. In the coming steps, - we will modify this copy in place to modify the types and things to - include references to the new regions that we are computing. -- We then invoke `nll::replace_regions_in_mir` to modify this copy C. - Among other things, this function will replace all of the regions in - the MIR with fresh [inference variables](./appendix/glossary.html). - - (More details can be found in [the regionck section](./mir/regionck.html).) -- Next, we perform a number of [dataflow - analyses](./appendix/background.html#dataflow) - that compute what data is moved and when. The results of these analyses - are needed to do both borrow checking and region inference. -- Using the move data, we can then compute the values of all the regions in the - MIR. - - (More details can be found in [the NLL section](./mir/regionck.html).) -- Finally, the borrow checker itself runs, taking as input (a) the - results of move analysis and (b) the regions computed by the region - checker. This allows us to figure out which loans are still in scope - at any particular point. -