diff options
Diffstat (limited to 'tvix/docs/src/eval')
-rw-r--r-- | tvix/docs/src/eval/abandoned/index.md | 3 | ||||
-rw-r--r-- | tvix/docs/src/eval/abandoned/thread-local-vm.md | 233 | ||||
-rw-r--r-- | tvix/docs/src/eval/bindings.md | 133 | ||||
-rw-r--r-- | tvix/docs/src/eval/build-references.md | 254 | ||||
-rw-r--r-- | tvix/docs/src/eval/builtins.md | 138 | ||||
-rw-r--r-- | tvix/docs/src/eval/catchable-errors.md | 131 | ||||
-rw-r--r-- | tvix/docs/src/eval/known-optimisation-potential.md | 162 | ||||
-rw-r--r-- | tvix/docs/src/eval/language-issues.md | 46 | ||||
-rw-r--r-- | tvix/docs/src/eval/opcodes-attrsets.md | 122 | ||||
-rw-r--r-- | tvix/docs/src/eval/recursive-attrs.md | 68 | ||||
-rw-r--r-- | tvix/docs/src/eval/vm-loop.md | 315 |
11 files changed, 1605 insertions, 0 deletions
diff --git a/tvix/docs/src/eval/abandoned/index.md b/tvix/docs/src/eval/abandoned/index.md new file mode 100644 index 000000000000..1cef704d08d7 --- /dev/null +++ b/tvix/docs/src/eval/abandoned/index.md @@ -0,0 +1,3 @@ +# Abandoned ideas + +This chapter keeps track of abandoned ideas, and why they were abandoned. diff --git a/tvix/docs/src/eval/abandoned/thread-local-vm.md b/tvix/docs/src/eval/abandoned/thread-local-vm.md new file mode 100644 index 000000000000..c6a2d5e07e5c --- /dev/null +++ b/tvix/docs/src/eval/abandoned/thread-local-vm.md @@ -0,0 +1,233 @@ +# We can't have nice things because IFD + +The thread-local VM work below was ultimately not merged because it +was decided that it would be harmful for `tvix::eval::Value` to +implement `Eq`, `Hash`, or any of the other `std` traits. + +Implementing `std` traits on `Value` was deemed harmful because IFD +can cause arbitrary amounts of compilation to occur, including +network transactions with builders. Obviously it would be +unexpected and error-prone to have a `PartialEq::eq()` which does +something like this. This problem does not manifest within the +"nixpkgs compatibility only" scope, or in any undeprecated language +feature other than IFD. Although IFD is outside the "nixpkgs +compatibility scope", it [has been added to the TVL compatibility +scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/). + +This was the sole reason for not merging. + +The explanation below may be useful in case future circumstances +affect the relevance of the reasoning above. + +The implementation can be found in these CLs: + +- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194) +- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195) +- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196) +- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197) + +# Thread-local storage for tvix::eval::vm::VM + +## The problem + +`Value::force()` takes a `&mut VM` argument, since forcing a value +requires executing opcodes. This means that `Value::nix_eq()` too +must take a `&mut VM`, since any sensible definition of equality +will have to force thunks. + +Unfortunately Rust's `PartialEq::eq()` function does not accept any +additional arguments like this, so `Value` cannot implement +`PartialEq`. Worse, structs which *contain* `Value`s can't +implement `PartialEq` either. This means `Value`, and anything +containing it, cannot be the key for a `BTreeMap` or `HashMap`. We +can't even insert `Value`s into a `HashSet`! + +There are other situations like this that don't involve `PartialEq`, +but it's the most glaring one. The main problem is that you need a +`VM` in order to force thunks, and thunks can be anywhere in a +`Value`. + +## Solving the problem with thread-locals + +We could avoid threading the `&mut VM` through the entire codebase +by making it a thread-local. + +To do this without a performance hit, we need to use LLVM +thread-locals, which are the same cost as references to `static`s +but load relative to +[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of +relative to the data segment. Unfortunately `#[thread_local]` [is +unstable][thread-local-unstable] and [unsafe in +general][thread-local-unsafe] for most of the cases where we would +want to use it. There is one [exception][tls-const-init], however: +if a `!thread_local()` has a `const` initializer, the compiler will +insert a `#[thread_local]`; this special case is both safe and +stable. + +The difficult decision is what the type of the thread-local should +be. Since you can't get a mutable reference to a `thread_local!()` +it will have to be some interior-mutability-bestowing wrapper around +our current `struct VM`. Here are the choices: + +### `RefCell<VM>` + +This is the obvious first choice, since it lets you borrow a +`RefMut<Target=VM>`. The problem here is that we want to keep the +codebase written such that all the functions in `impl VM` still take +a `&mut self`. This means that there will be an active mutable +borrow for the duration of `VM::call_builtin()`. So if we implement +`PartialEq` by having `eq()` attempt a second mutable borrow from +the thread-local storage, it will fail since there is already an +active borrow. + +The problem here is that you can't "unborrow" a `RefMut` except by +dropping it. There's no way around this. + +#### Problem: Uglification + +The only solution here is to rewrite all the functions in `impl VM` +so they don't take any kind of `self` argument, and then have them +do a short-lived `.borrow_mut()` from the thread-local `RefCell` +*separately, each time* they want to modify one of the fields of +`VM` (currently `frames`, `stack`, `with_stack`, `warnings`). This +means that if you had a code sequence like this: + +``` +impl VM { + fn foo(&mut self, ...) { + ... + self.frame().ip += 1; + self.some_other_method(); + self.frame().ip += 1; +``` + +You would need to add *two separate `borrow_mut()`s*, one for each +of the `self.frame().ip+=1` statements. You can't just do one big +`borrow_mut()` because `some_other_method()` will call +`borrow_mut()` and panic. + +#### Problem: Performance + +The `RefCell<VM>` approach also has a fairly huge performance hit, +because every single modification to any part of `VM` will require a +reference count increment/decrement, and a conditional branch based +on the check (which will never fail) that the `RefCell` isn't +already mutably borrowed. It will also impede a lot of rustc's +optimizations. + +### `Cell<VM>` + +This is a non-starter because it means that in order to mutate any +field of `VM`, you have to move the entire `struct VM` out of the +`Cell`, mutate it, and move it back in. + +### `Cell<Box<VM>>` + +Now we're getting warmer. Here, we can move the `Box<VM>` out of +the cell with a single pointer-sized memory access. + +We don't want to do the "uglification" described in the previous +section. We are very fortunate that, sometime in mid-2019, the Rust +dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T` +are bit-for-bit identical, and even gave us mortals safe wrappers +[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around +`mem::transmute()`. + +So now, when a `VM` method (which takes `&mut self`) calls out to +some external code (like a builtin), instead of passing the `&mut +self` to the external code it can call `Cell::from_mut(&mut self)`, +and then `Cell::swap()` that into the thread-local storage cell for +the duration of the external code. After the external code returns, +it can `Cell::swap()` it back. This whole dance gets wrapped in a +lexical block, and the borrow checker sees that the `&Cell<Box<VM>>` +returned by `Cell::from_mut()` lives only until the end of the +lexical block, *so we get the `&mut self` back after the close-brace +for that block*. NLL FTW. This sounds like a lot of work, but it +should compile down to two pointer-sized loads and two pointer-sized +stores, and it is incurred basically only for `OpBuiltin`. + +This all works, with only two issues: + +1. `vm.rs` needs to be very careful to do the thread-local cell swap + dance before calling anything that might call `PartialEq::eq()` + (or any other method that expects to be able to pull the `VM` out + of thread-local storage). There is no compile-time check that we + did the dance in all the right places. If we forget to do the + dance somewhere we'll get a runtime panic from `Option::expect()` + (see next section). + +2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather + than a bare `VM`, we still need to rewrite all of `vm.rs` so that + every function takes a `&mut Box<VM>` instead of a `&mut self`. + This creates a huge amount of "noise" in the code. + +Fortunately, it turns out that nearly all the "noise" that arises +from the second point can be eliminated by taking advantage of +[deref coercions][deref-coercions]! This was the last "shoe to +drop". + +There is still the issue of having to be careful about calls from +`vm.rs` to things outside that file, but it's manageable. + +### `Cell<Option<Box<VM>>>` + +In order to get the "safe and stable `#[thread_local]`" +[exception][tls-const-init] we need a `const` initializer, which +means we need to be able to put something into the `Cell` that isn't +a `VM`. So the type needs to be `Cell<Option<Box<VM>>>`. + +Recall that you can't turn an `Option<&T>` into an `&Option<T>`. +The latter type has the "is this a `Some` or `None`" bit immediately +adjacent to the bits representing `T`. So if I hand you a `t:&T` +and you wrap it as `Some(t)`, those bits aren't adjacent in memory. +This means that all the VM methods need to operate on an +`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut +self` "at the last minute" before inserting it into the thread-local +storage cell. Fortunately deref coercions save the day here too -- +the coercion is inferred through both layers (`Box` and `Option`) of +wrapper, so there is no additional noise in the code. + +Note that Rust is clever and can find some sequence of bits that +aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`. And in fact, +`Box<T>` is one of these cases (and this is guaranteed). So the +`Option` has no overhead. + +# Closing thoughts, language-level support + +This would have been easier with language-level support. + +## What wouldn't help + +Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut +T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same +thing as a `Cell<T>`, so it wouldn't be safe for the standard +library to contain something like this: + +``` +impl<T> LocalKey<Cell<T>> { + fn get_mut(&self) -> &mut T { + unsafe { + // ... mem::transmute() voodoo goes here ... +``` + +The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and +end up with two `&mut T`s that point to the same thing (mutable aliasing) which +results in undefined behavior. + +## What would help + +The ideal solution is for Rust to let you call arbitrary methods +`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`. This way you can +have one (and only one) `&mut T` at any syntactical point in your +program -- the `&mut self`. + + +[tls-const-init]: https://github.com/rust-lang/rust/pull/90774 +[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594 +[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366 +[fiat-decree]: https://github.com/rust-lang/rust/issues/43038 +[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut +[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut +[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366] +[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods +[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic diff --git a/tvix/docs/src/eval/bindings.md b/tvix/docs/src/eval/bindings.md new file mode 100644 index 000000000000..17867acfe9f9 --- /dev/null +++ b/tvix/docs/src/eval/bindings.md @@ -0,0 +1,133 @@ +Compilation of bindings +======================= + +Compilation of Nix bindings is one of the most mind-bending parts of Nix +evaluation. The implementation of just the compilation is currently almost 1000 +lines of code, excluding the various insane test cases we dreamt up for it. + +## What is a binding? + +In short, any attribute set or `let`-expression. Tvix currently does not treat +formals in function parameters (e.g. `{ name ? "fred" }: ...`) the same as these +bindings. + +They have two very difficult features: + +1. Keys can mutually refer to each other in `rec` sets or `let`-bindings, + including out of definition order. +2. Attribute sets can be nested, and parts of one attribute set can be defined + in multiple separate bindings. + +Tvix resolves as much of this logic statically (i.e. at compile-time) as +possible, but the procedure is quite complicated. + +## High-level concept + +The idea behind the way we compile bindings is to fully resolve nesting +statically, and use the usual mechanisms (i.e. recursion/thunking/value +capturing) for resolving dynamic values. + +This is done by compiling bindings in several phases: + +1. An initial compilation phase *only* for plain inherit statements (i.e. + `inherit name;`), *not* for namespaced inherits (i.e. `inherit (from) + name;`). + +2. A declaration-only phase, in which we use the compiler's scope tracking logic + to calculate the physical runtime stack indices (further referred to as + "stack slots" or just "slots") that all values will end up in. + + In this phase, whenever we encounter a nested attribute set, it is merged + into a custom data structure that acts like a synthetic AST node. + + This can be imagined similar to a rewrite like this: + + ```nix + # initial code: + { + a.b = 1; + a.c = 2; + } + + # rewritten form: + { + a = { + b = 1; + c = 2; + }; + } + ``` + + The rewrite applies to attribute sets and `let`-bindings alike. + + At the end of this phase, we know the stack slots of all namespaces for + inheriting from, all values inherited from them, and all values (and + optionally keys) of bindings at the current level. + + Only statically known keys are actually merged, so any dynamic keys that + conflict will lead to a "key already defined" error at runtime. + +3. A compilation phase, in which all values (and, when necessary, keys) are + actually compiled. In this phase the custom data structure used for merging + is encountered when compiling values. + + As this data structure acts like an AST node, the process begins recursively + for each nested attribute set. + +At the end of this process we have bytecode that leaves the required values (and +optionally keys) on the stack. In the case of attribute sets, a final operation +is emitted that constructs the actual attribute set structure at runtime. For +`let`-bindings a final operation is emitted that removes these locals from the +stack when the scope ends. + +## Moving parts + +WARNING: This documents the *current* implementation. If you only care about the +conceptual aspects, see above. + +There's a few types involved: + +* `PeekableAttrs`: peekable iterator over an attribute path (e.g. `a.b.c`) +* `BindingsKind`: enum defining the kind of bindings (attrs/recattrs/let) +* `AttributeSet`: struct holding the bindings kind, the AST nodes with inherits + (both namespaced and not), and an internal representation of bindings + (essentially a vector of tuples of the peekable attrs and the expression to + compile for the value). +* `Binding`: enum describing the kind of binding (namespaced inherit, attribute + set, plain binding of *any other value type*) +* `KeySlot`: enum describing the location in which a key slot is placed at + runtime (nowhere, statically known value in a slot, dynamic value in a slot) +* `TrackedBinding`: struct representing statically known information about a + single binding (its key slot, value slot and `Binding`) +* `TrackedBindings`: vector of tracked bindings, which implements logic for + merging attribute sets together + +And quite a few methods on `Compiler`: + +* `compile_bindings`: entry point for compiling anything that looks like a + binding, this calls out to the functions below. +* `compile_plain_inherits`: takes all inherits of a bindings node and compiles + the ones that are trivial to compile (i.e. just plain inherits without a + namespace). The `rnix` parser does not represent namespaced/plain inherits in + different nodes, so this function also aggregates the namespaced inherits and + returns them for further use +* `declare_namespaced_inherits`: passes over all namespaced inherits and + declares them on the locals stack, as well as inserts them into the provided + `TrackedBindings` +* `declare_bindings`: declares all regular key/value bindings in a bindings + scope, but without actually compiling their keys or values. + + There's a lot of heavy lifting going on here: + + 1. It invokes the various pieces of logic responsible for merging nested + attribute sets together, creating intermediate data structures in the value + slots of bindings that can be recursively processed the same way. + 2. It decides on the key slots of expressions based on the kind of bindings, + and the type of expression providing the key. +* `bind_values`: runs the actual compilation of values. Notably this function is + responsible for recursively compiling merged attribute sets when it encounters + a `Binding::Set` (on which it invokes `compile_bindings` itself). + +In addition to these several methods (such as `compile_attr_set`, +`compile_let_in`, ...) invoke the binding-kind specific logic and then call out +to the functions above. diff --git a/tvix/docs/src/eval/build-references.md b/tvix/docs/src/eval/build-references.md new file mode 100644 index 000000000000..badcea11550e --- /dev/null +++ b/tvix/docs/src/eval/build-references.md @@ -0,0 +1,254 @@ +Build references in derivations +=============================== + +This document describes how build references are calculated in Tvix. Build +references are used to determine which store paths should be available to a +builder during the execution of a build (i.e. the full build closure of a +derivation). + +## String contexts in C++ Nix + +In C++ Nix, each string value in the evaluator carries an optional so-called +"string context". + +These contexts are themselves a list of strings that take one of the following +formats: + +1. `!<output_name>!<drv_path>` + + This format describes a build reference to a specific output of a derivation. + +2. `=<drv_path>` + + This format is used for a special case where a derivation attribute directly + refers to a derivation path (e.g. by accessing `.drvPath` on a derivation). + + Note: In C++ Nix this case is quite special and actually requires a + store-database query during evaluation. + +3. `<path>` - a non-descript store path input, usually a plain source file (e.g. + from something like `src = ./.` or `src = ./foo.txt`). + + In the case of `unsafeDiscardOutputDependency` this is used to pass a raw + derivation file, but *not* pull in its outputs. + +Lets introduce names for these (in the same order) to make them easier to +reference below: + +```rust +enum BuildReference { + /// !<output_name>!<drv_path> + SingleOutput(OutputName, DrvPath), + + /// =<drv_path> + DrvClosure(DrvPath), + + /// <path> + Path(StorePath), +} +``` + +String contexts are, broadly speaking, created whenever a string is the result +of a computation (e.g. string interpolation) that used a *computed* path or +derivation in any way. + +Note: This explicitly does *not* include simply writing a literal string +containing a store path (whether valid or not). That is only permitted through +the `storePath` builtin. + +## Derivation inputs + +Based on the data above, the fields `inputDrvs` and `inputSrcs` of derivations +are populated in `builtins.derivationStrict` (the function which +`builtins.derivation`, which isn't actually a builtin, wraps). + +`inputDrvs` is represented by a map of derivation paths to the set of their +outputs that were referenced by the context. + +TODO: What happens if the set is empty? Somebody claimed this means all outputs. + +`inputSrcs` is represented by a set of paths. + +These are populated by the above references as follows: + +* `SingleOutput` entries are merged into `inputDrvs` +* `Path` entries are inserted into `inputSrcs` +* `DrvClosure` leads to a special store computation (`computeFSClosure`), which + finds all paths referenced by the derivation and then inserts all of them into + the fields as above (derivations with _all_ their outputs) + +This is then serialised in the derivation and passed down the pipe. + +## Builtins interfacing with contexts + +C++ Nix has several builtins that interface directly with string contexts: + +* `unsafeDiscardStringContext`: throws away a string's string context (if + present) +* `hasContext`: returns `true`/`false` depending on whether the string has + context +* `unsafeDiscardOutputDependency`: drops dependencies on the *outputs* of a + `.drv` in the context, passing only the literal `.drv` itself + + Note: This is only used for special test-cases in nixpkgs, and deprecated Nix + commands like `nix-push`. +* `getContext`: returns the string context in serialised form as a Nix attribute + set +* `appendContext`: adds a given string context to the string in the same format + as returned by `getContext` + +Most of the string manipulation operations will propagate the context to the +result based on their parameters' contexts. + +## Placeholders + +C++ Nix has `builtins.placeholder`, which given the name of an output (e.g. +`out`) creates a hashed string representation of that output name. If that +string is used anywhere in input attributes, the builder will replace it with +the actual name of the corresponding output of the current derivation. + +C++ Nix does not use contexts for this, it blindly creates a rewrite map of +these placeholder strings to the names of all outputs, and runs the output +replacement logic on all environment variables it creates, attribute files it +passes etc. + +## Tvix & string contexts + +In the past, Tvix did not track string contexts in its evaluator at all, see +the historical section for more information about that. + +Tvix tracks string contexts in every `NixString` structure via a +`HashSet<BuildReference>` and offers an API to combine the references while +keeping the exact internal structure of that data private. + +## Historical attempt: Persistent reference tracking + +We were investigating implementing a system which allows us to drop string +contexts in favour of reference scanning derivation attributes. + +This means that instead of maintaining and passing around a string context data +structure in eval, we maintain a data structure of *known paths* from the same +evaluation elsewhere in Tvix, and scan each derivation attribute against this +set of known paths when instantiating derivations. + +We believed we could take the stance that the system of string contexts as +implemented in C++ Nix is likely an implementation detail that should not be +leaking to the language surface as it does now. + +### Tracking "known paths" + +Every time a Tvix evaluation does something that causes a store interaction, a +"known path" is created. On the language surface, this is the result of one of: + +1. Path literals (e.g. `src = ./.`). +2. Calls to `builtins.derivationStrict` yielding a derivation and its output + paths. +3. Calls to `builtins.path`. + +Whenever one of these occurs, some metadata that persists for the duration of +one evaluation should be created in Nix. This metadata needs to be available in +`builtins.derivationStrict`, and should be able to respond to these queries: + +1. What is the set of all known paths? (used for e.g. instantiating an + Aho-Corasick type string searcher) +2. What is the _type_ of a path? (derivation path, derivation output, source + file) +3. What are the outputs of a derivation? +4. What is the derivation of an output? + +These queries will need to be asked of the metadata when populating the +derivation fields. + +Note: Depending on how we implement `builtins.placeholder`, it might be useful +to track created placeholders in this metadata, too. + +### Context builtins + +Context-reading builtins can be implemented in Tvix by adding `hasContext` and +`getContext` with the appropriate reference-scanning logic. However, we should +evaluate how these are used in nixpkgs and whether their uses can be removed. + +Context-mutating builtins can be implemented by tracking their effects in the +value representation of Tvix, however we should consider not doing this at all. + +`unsafeDiscardOutputDependency` should probably never be used and we should warn +or error on it. + +`unsafeDiscardStringContext` is often used as a workaround for avoiding IFD in +inconvenient places (e.g. in the TVL depot pipeline generation). This is +unnecessary in Tvix. We should evaluate which other uses exist, and act on them +appropriately. + +The initial danger with diverging here is that we might cause derivation hash +discrepancies between Tvix and C++ Nix, which can make initial comparisons of +derivations generated by the two systems difficult. If this occurs we need to +discuss how to approach it, but initially we will implement the mutating +builtins as no-ops. + +### Why this did not work for us? + +Nix has a feature to perform environmental checks of your derivation, e.g. +"these derivation outputs should not be referenced in this derivation", this was +introduced in Nix 2.2 by +https://github.com/NixOS/nix/commit/3cd15c5b1f5a8e6de87d5b7e8cc2f1326b420c88. + +Unfortunately, this feature introduced a very unfortunate and critical bug: all +usage of this feature with contextful strings will actually force the +derivation to depend at least at build time on those specific paths, see +https://github.com/NixOS/nix/issues/4629. + +For example, if you wanted to `disallowedReferences` to a package and you used a +derivation as a path, you would actually register that derivation as a input +derivation of that derivation. + +This bug is still unfixed in Nix and it seems that fixing it would require +introducing different ways to evaluate Nix derivations to preserve the +output path calculation for Nix expressions so far. + +All of this would be fine if the bug behavior was uniform in the sense that no +one tried to force-workaround it. Since Nixpkgs 23.05, due to +https://github.com/NixOS/nixpkgs/pull/211783 this is not true anymore. + +If you let nixpkgs be the disjoint union of bootstrapping derivations $A$ and +`stdenv.mkDerivation`-built derivations $B$. + +$A$ suffers from the bug and $B$ doesn't by the forced usage of +`unsafeDiscardStringContext` on those special checking fields. + +This means that to build hash-compatible $A$ **and** $B$, we need to +distinguish $A$ and $B$. A lot of hacks could be imagined to support this +problem. + +Let's assume we have a solution to that problem, it means that we are able to +detect implicitly when a set of specific fields are +`unsafeDiscardStringContext`-ed. + +Thus, we could use that same trick to implement `unsafeDiscardStringContext` +entirely for all fields actually. + +Now, to implement `unsafeDiscardStringContext` in the persistent reference +tracking model, you will need to store a disallowed list of strings that should +not trigger a reference when we are scanning a derivation parameters. + +But assume you have something like: + +```nix +derivation { + buildInputs = [ + stdenv.cc + ]; + + disallowedReferences = [ stdenv.cc ]; +} +``` + +If you unregister naively the `stdenv.cc` reference, it will silence the fact +that it is part of the `buildInputs`, so you will observe that Nix will fail +the derivation during environmental check, but Tvix would silently force remove +that reference. + +Until proven otherwise, it seems highly difficult to have the fine-grained +information to prevent reference tracking of those specific fields. It is not a +failure of the persistent reference tracking, it is an unresolved critical bug +of Nix that only nixpkgs really workarounded for `stdenv.mkDerivation`-based +derivations. diff --git a/tvix/docs/src/eval/builtins.md b/tvix/docs/src/eval/builtins.md new file mode 100644 index 000000000000..dba4c48c65e1 --- /dev/null +++ b/tvix/docs/src/eval/builtins.md @@ -0,0 +1,138 @@ +Nix builtins +============ + +Nix has a lot of built-in functions, some of which are accessible in +the global scope, and some of which are only accessible through the +global `builtins` attribute set. + +This document is an attempt to track all of these builtins, but +without documenting their functionality. + +See also https://nixos.org/manual/nix/stable/expressions/builtins.html + +The `impl` column indicates implementation status in tvix: +- implemented: "" (empty cell) +- not yet implemented, but not blocked: `todo` +- not yet implemented, but blocked by other prerequisites: + - `store`: awaiting eval<->store api(s) + - `context`: awaiting support for string contexts + +| name | global | arity | pure | impl | +|-------------------------------|--------|-------|-------|---------| +| abort | true | 1 | | | +| add | false | 2 | true | | +| addErrorContext | false | ? | | context | +| all | false | 2 | true | | +| any | false | 2 | true | | +| appendContext | false | ? | | | +| attrNames | false | 1 | true | | +| attrValues | false | | true | | +| baseNameOf | true | | | | +| bitAnd | false | | | | +| bitOr | false | | | | +| bitXor | false | | | | +| builtins | true | | | | +| catAttrs | false | | | | +| compareVersions | false | | | | +| concatLists | false | | | | +| concatMap | false | | | | +| concatStringsSep | false | | | | +| currentSystem | false | | | | +| currentTime | false | | false | | +| deepSeq | false | | | | +| derivation | true | | | store | +| derivationStrict | true | | | store | +| dirOf | true | | | | +| div | false | | | | +| elem | false | | | | +| elemAt | false | | | | +| false | true | | | | +| fetchGit | true | | | store | +| fetchMercurial | true | | | store | +| fetchTarball | true | | | store | +| fetchurl | false | | | store | +| filter | false | | | | +| filterSource | false | | | store | +| findFile | false | | false | todo | +| foldl' | false | | | | +| fromJSON | false | | | | +| fromTOML | true | | | | +| functionArgs | false | | | | +| genList | false | | | | +| genericClosure | false | | | todo | +| getAttr | false | | | | +| getContext | false | | | | +| getEnv | false | | false | | +| hasAttr | false | | | | +| hasContext | false | | | | +| hashFile | false | | false | | +| hashString | false | | | | +| head | false | | | | +| import | true | | | | +| intersectAttrs | false | | | | +| isAttrs | false | | | | +| isBool | false | | | | +| isFloat | false | | | | +| isFunction | false | | | | +| isInt | false | | | | +| isList | false | | | | +| isNull | true | | | | +| isPath | false | | | | +| isString | false | | | | +| langVersion | false | | | | +| length | false | | | | +| lessThan | false | | | | +| listToAttrs | false | | | | +| map | true | | | | +| mapAttrs | false | | | | +| match | false | | | | +| mul | false | | | | +| nixPath | false | | | todo | +| nixVersion | false | | | todo | +| null | true | | | | +| parseDrvName | false | | | | +| partition | false | | | | +| path | false | | sometimes | store | +| pathExists | false | | false | | +| placeholder | true | | | context | +| readDir | false | | false | | +| readFile | false | | false | | +| removeAttrs | true | | | | +| replaceStrings | false | | | | +| scopedImport | true | | | | +| seq | false | | | | +| sort | false | | | | +| split | false | | | | +| splitVersion | false | | | | +| storeDir | false | | | store | +| storePath | false | | | store | +| stringLength | false | | | | +| sub | false | | | | +| substring | false | | | | +| tail | false | | | | +| throw | true | | | | +| toFile | false | | | store | +| toJSON | false | | | | +| toPath | false | | | | +| toString | true | | | | +| toXML | true | | | | +| trace | false | | | | +| true | true | | | | +| tryEval | false | | | | +| typeOf | false | | | | +| unsafeDiscardOutputDependency | false | | | | +| unsafeDiscardStringContext | false | | | | +| unsafeGetAttrPos | false | | | todo | +| valueSize | false | | | todo | + +## Added after C++ Nix 2.3 (without Flakes enabled) + +| name | global | arity | pure | impl | +|---------------|--------|-------|-------|-------| +| break | false | 1 | | todo | +| ceil | false | 1 | true | | +| fetchTree | true | 1 | | todo | +| floor | false | 1 | true | | +| groupBy | false | 2 | true | | +| traceVerbose | false | 2 | | todo | +| zipAttrsWith | false | 2 | true | todo | diff --git a/tvix/docs/src/eval/catchable-errors.md b/tvix/docs/src/eval/catchable-errors.md new file mode 100644 index 000000000000..ce320a921777 --- /dev/null +++ b/tvix/docs/src/eval/catchable-errors.md @@ -0,0 +1,131 @@ +# (Possible) Implementation(s) of Catchable Errors for `builtins.tryEval` + +## Terminology + +Talking about “catchable errors” in Nix in general is a bit precarious since +there is no properly established terminology. Also, the existing terms are less +than apt. The reason for this lies in the fact that catchable errors (or +whatever you want to call them) don't properly _exist_ in the language: While +Nix's `builtins.tryEval` is (originally) based on the C++ exception system, +it specifically lacks the ability of such systems to have an exception _value_ +whilst handling it. Consequently, these errors don't have an obvious name +as they never appear _in_ the Nix language. They just have to be named in the +respective Nix implementation: + +- In C++ Nix the only term for such errors is `AssertionError` which is the + name of the (C++) exception used in the implementation internally. This + term isn't great, though, as `AssertionError`s can not only be generated + using `assert`, but also using `throw` and failed `NIX_PATH` resolutions. + Were this terminology to be used in documentation addressing Nix language + users, it would probably only serve confusion. + +- Tvix currently (as of r/7573) uses the term catchable errors. This term + relates to nothing in the language as such: Errors are not caught, we rather + try to evaluate an expression. Catching also sort of implies that a value + representation of the error is attainable (like in an exception system) which + is untrue. + +In light of this I (sterni) would like to suggest “tryable errors” as an +alternative term going forward which isn't inaccurate and relates to terms +already established by language internal naming. + +However, this document will continue using the term catchable error until the +naming is adjusted in Tvix itself. + +## Implementation + +Below we discuss different implementation approaches in Tvix in order to arrive +at a proposal for the new one. The historical discussion is intended as a basis +for discussing the proposal: Are we committing to an old or current mistake? Are +we solving all problems that cropped up or were solved at any given point in +time? + +### Original + +The original implementation of `tryEval` in cl/6924 was quite straightforward: +It would simply interrupt the propagation of a potential catchable error to the +top level (which usually happened using the `?` operator) in the builtin and +construct the appropriate representation of an unsuccessful evaluation if the +error was deemed catchable. It had, however, multiple problems: + +- The VM was originally written without `tryEval` in mind, i.e. it largely + assumed that an error would always cause execution to be terminated. This + problem was later solved (cl/6940). +- Thunks could not be `tryEval`-ed multiple times (b/281). This was another + consequence of VM architecture at the time: Thunks would be blackholed + before evaluation was started and the error could occur. Due to the + interaction of the generator-based VM code and `Value::force` the part + of the code altering the thunk state would never be informed about the + evaluation result in case of a failure, so the thunk would remain + blackholed leading to a crash if the same thunk was `tryEval`-ed or + forced again. To solve this issue, amjoseph completely overhauled + the implementation. + +One key point about this implementation is that it is based on the assumption +that catchable errors can only be generated in thunks, i.e. expressions causing +them are never evaluated strictly. This can be illustrated using C++ Nix: + +```console +> nix-instantiate --eval -E '[ (assert false; true) (builtins.throw "") <nixpkgs> ]' +[ <CODE> <CODE> <CODE> ] +``` + +If this wasn't the case, the VM could encounter the error in a situation where +the error would not have needed to pass through the `tryEval` builtin, causing +evaluation to abort. + +### Present + +The current system (mostly implemented in cl/9289) uses a very different +approach: Instead of relying on the thunk boundary, catchable errors are no +longer errors, but special values. They are created at the relevant points (e.g. +`builtins.throw`) and propagated whenever they are encountered by VM ops or +builtins. Finally, they either encounter `builtins.tryEval` (and are converted to +an ordinary value again) or the top level where they become a normal error again. + +The problems with this mostly stem from the confusion between values and errors +that it necessitates: + +- In most circumstances, catchable errors end up being errors again, as `tryEval` + is not used a lot. So `throw`s usually end up causing evaluation to abort. + Consequently, not only `Value::Catchable` is necessary, but also a corresponding + error variant that is _only_ created if a catchable value remains at the end of + evaluation. A requirement that was missed until cl/10991 (!) which illustrate + how strange that architecture is. A consequence of this is that catchable + errors have no location information at all. +- `Value::Catchable` is similar to other internal values in Tvix, but is much + more problematic. Aside from thunks, internal values only exist for a brief + amount of time on the stack and it is very clear what parts of the VM or + builtins need to handle them. This means that the rest of the implementation + need to consider them, keeping the complexity caused by the internal value + low. `Value::Catchable`, on the other hand, may exist anywhere and be passed + to any VM op or builtin, so it needs to be correctly propagated _everywhere_. + This causes a lot of noise in the code as well as a big potential for bugs. + Essentially, catchable errors require as much attention by the Tvix developer + as laziness. This doesn't really correlate to the importance of the two + features to the Nix language. + +### Future? + +The core assumption of the original solution does offer a path forward: After +cl/9289 we should be in a better position to introspect an error occurring from +within the VM code, but we need a better way of storing such an error to prevent +another b/281. If catchable errors can only be generated in thunks, we can just +use the thunk representation for this. This would mean that `Thunk::force_` +would need to check if evaluation was successful and (in case of failure) +change the thunk representation + +- either to the original `ThunkRepr::Suspended` which would be simple, but of + course mean duplicated evaluation work in some expressions. In fact, this + would probably leave a lot of easy performance on the table for use cases we + would like to support, e.g. tree walkers for nixpkgs. +- or to a new `ThunkRepr` variant that stores the kind of the error and all + necessary location info so stack traces can work properly. This of course + reintroduces some of the difficulty of having two kinds of errors, but it is + hopefully less problematic, as the thunk boundary (i.e. `Thunk::force`) is + where errors would usually occur. + +Besides the question whether this proposal can actually be implemented, another +consideration is whether the underlying assumption will hold in the future, i.e. +can we implement optimizations for thunk elimination in a way that thunks that +generate catchable errors are never eliminated? diff --git a/tvix/docs/src/eval/known-optimisation-potential.md b/tvix/docs/src/eval/known-optimisation-potential.md new file mode 100644 index 000000000000..0ab185fe1be2 --- /dev/null +++ b/tvix/docs/src/eval/known-optimisation-potential.md @@ -0,0 +1,162 @@ +Known Optimisation Potential +============================ + +There are several areas of the Tvix evaluator code base where +potentially large performance gains can be achieved through +optimisations that we are already aware of. + +The shape of most optimisations is that of moving more work into the +compiler to simplify the runtime execution of Nix code. This leads, in +some cases, to drastically higher complexity in both the compiler +itself and in invariants that need to be guaranteed between the +runtime and the compiler. + +For this reason, and because we lack the infrastructure to adequately +track their impact (WIP), we have not yet implemented these +optimisations, but note the most important ones here. + +* Use "open upvalues" [hard] + + Right now, Tvix will immediately close over all upvalues that are + created and clone them into the `Closure::upvalues` array. + + Instead of doing this, we can statically determine most locals that + are closed over *and escape their scope* (similar to how the + `compiler::scope::Scope` struct currently tracks whether locals are + used at all). + + If we implement the machinery to track this, we can implement some + upvalues at runtime by simply sticking stack indices in the upvalue + array and only copy the values where we know that they escape. + +* Avoid `with` value duplication [easy] + + If a `with` makes use of a local identifier in a scope that can not + close before the with (e.g. not across `LambdaCtx` boundaries), we + can avoid the allocation of the phantom value and duplication of the + `NixAttrs` value on the stack. In this case we simply push the stack + index of the known local. + +* Multiple attribute selection [medium] + + An instruction could be introduced that avoids repeatedly pushing an + attribute set to/from the stack if multiple keys are being selected + from it. This occurs, for example, when inheriting from an attribute + set or when binding function formals. + +* Split closure/function representation [easy] + + Functions have fewer fields that need to be populated at runtime and + can directly use the `value::function::Lambda` representation where + possible. + +* Apply `compiler::optimise_select` to other set operations [medium] + + In addition to selects, statically known attribute resolution could + also be used for things like `?` or `with`. The latter might be a + little more complicated but is worth investigating. + +* Inline fully applied builtins with equivalent operators [medium] + + Some `builtins` have equivalent operators, e.g. `builtins.sub` + corresponds to the `-` operator, `builtins.hasAttr` to the `?` + operator etc. These operators additionally compile to a primitive + VM opcode, so they should be just as cheap (if not cheaper) as + a builtin application. + + In case the compiler encounters a fully applied builtin (i.e. + no currying is occurring) and the `builtins` global is unshadowed, + it could compile the equivalent operator bytecode instead: For + example, `builtins.sub 20 22` would be compiled as `20 - 22`. + This would ensure that equivalent `builtins` can also benefit + from special optimisations we may implement for certain operators + (in the absence of currying). E.g. we could optimise access + to the `builtins` attribute set which a call to + `builtins.getAttr "foo" builtins` should also profit from. + +* Avoid nested `VM::run` calls [hard] + + Currently when encountering Nix-native callables (thunks, closures) + the VM's run loop will nest and return the value of the nested call + frame one level up. This makes the Rust call stack almost mirror the + Nix call stack, which is usually undesirable. + + It is possible to detect situations where this is avoidable and + instead set up the VM in such a way that it continues and produces + the desired result in the same run loop, but this is kind of tricky + to get right - especially while other parts are still in flux. + + For details consult the commit with Gerrit change ID + `I96828ab6a628136e0bac1bf03555faa4e6b74ece`, in which the initial + attempt at doing this was reverted. + +* Avoid thunks if only identifier closing is required [medium] + + Some constructs, like `with`, mostly do not change runtime behaviour + if thunked. However, they are wrapped in thunks to ensure that + deferred identifiers are resolved correctly. + + This can be avoided, as we statically analyse the scope and should + be able to tell whether any such logic was required. + +* Intern literals [easy] + + Currently, the compiler emits a separate entry in the constant + table for each literal. So the program `1 + 1 + 1` will have + three entries in its `Chunk::constants` instead of only one. + +* Do some list and attribute set operations in place [hard] + + Algorithms that can not do a lot of work inside `builtins` like `map`, + `filter` or `foldl'` usually perform terribly if they use data structures like + lists and attribute sets. + + `builtins` can do work in place on a copy of a `Value`, but naïvely expressed + recursive algorithms will usually use `//` and `++` to do a single change to a + `Value` at a time, requiring a full copy of the data structure each time. + It would be a big improvement if we could do some of these operations in place + without requiring a new copy. + + There are probably two approaches: We could determine statically if a value is + reachable from elsewhere and emit a special in place instruction if not. An + easier alternative is probably to rely on reference counting at runtime: If no + other reference to a value exists, we can extend the list or update the + attribute set in place. + + An **alternative** to this is using [persistent data + structures](https://en.wikipedia.org/wiki/Persistent_data_structure) or at the + very least [immutable data structures](https://docs.rs/im/latest/im/) that can + be copied more efficiently than the stock structures we are using at the + moment. + +* Skip finalising unfinalised thunks or non-thunks instead of crashing [easy] + + Currently `OpFinalise` crashes the VM if it is called on values that don't + need to be finalised. This helps catching miscompilations where `OpFinalise` + operates on the wrong `StackIdx`. In the case of function argument patterns, + however, this means extra VM stack and instruction overhead for dynamically + determining if finalisation is necessary or not. This wouldn't be necessary + if `OpFinalise` would just noop on any values that don't need to be finalised + (anymore). + +* Phantom binding for from expression of inherits [easy] + + The from expression of an inherit is reevaluated for each inherit. This can + be demonstrated using the following Nix expression which, counter-intuitively, + will print “plonk” twice. + + ```nix + let + inherit (builtins.trace "plonk" { a = null; b = null; }) a b; + in + builtins.seq a (builtins.seq b null) + ``` + + In most Nix code, the from expression is just an identifier, so it is not + terribly inefficient, but in some cases a more expensive expression may + be used. We should create a phantom binding for the from expression that + is reused in the inherits, so only a single thunk is created for the from + expression. + + Since we discovered this, C++ Nix has implemented a similar optimization: + <https://github.com/NixOS/nix/pull/9847>. diff --git a/tvix/docs/src/eval/language-issues.md b/tvix/docs/src/eval/language-issues.md new file mode 100644 index 000000000000..152e6594a1d0 --- /dev/null +++ b/tvix/docs/src/eval/language-issues.md @@ -0,0 +1,46 @@ +# Nix language issues + +In the absence of a language standard, what Nix (the language) is, is prescribed +by the behavior of the C++ Nix implementation. Still, there are reasons not to +accept some behavior: + +* Tvix aims for nixpkgs compatibility only. This means we can ignore behavior in + edge cases nixpkgs doesn't trigger as well as obscure features it doesn't use + (e.g. `__overrides`). +* Some behavior of the Nix evaluator seems to be unintentional or an + implementation detail leaking out into language behavior. + +Especially in the latter case, it makes sense to raise the respective issue and +maybe to get rid of the behavior in all implementations for good. Below is an +(incomplete) list of such issues: + +* [Behaviour of nested attribute sets depends on definition order][i7111] +* [Partially constructed attribute sets are observable during dynamic attr names construction][i7012] +* [Nix parsers merges multiple attribute set literals for the same key incorrectly depending on definition order][i7115] + +On the other hand, there is behavior that seems to violate one's expectation +about the language at first, but has good enough reasons from an implementor's +perspective to keep them: + +* Dynamic keys are forbidden in `let` and `inherit`. This makes sure that we + only need to do runtime identifier lookups for `with`. More dynamic (i.e. + runtime) lookups would make the scoping system even more complicated as well + as hurt performance. +* Dynamic attributes of `rec` sets are not added to its scope. This makes sense + for the same reason. +* Dynamic and nested attributes in attribute sets don't get merged. This is a + tricky one, but avoids doing runtime (recursive) merges of attribute sets. + Instead all necessary merging can be inferred statically, i.e. the C++ Nix + implementation already merges at parse time, making nested attribute keys + syntactic sugar effectively. + +Other behavior is just odd, surprising or underdocumented: + +* `builtins.foldl'` doesn't force the initial accumulator (but all other + intermediate accumulator values), differing from e.g. Haskell, see + the [relevant PR discussion][p7158]. + +[i7111]: https://github.com/NixOS/nix/issues/7111 +[i7012]: https://github.com/NixOS/nix/issues/7012 +[i7115]: https://github.com/NixOS/nix/issues/7115 +[p7158]: https://github.com/NixOS/nix/pull/7158 diff --git a/tvix/docs/src/eval/opcodes-attrsets.md b/tvix/docs/src/eval/opcodes-attrsets.md new file mode 100644 index 000000000000..7026f3319dda --- /dev/null +++ b/tvix/docs/src/eval/opcodes-attrsets.md @@ -0,0 +1,122 @@ +# attrset-opcodes + +The problem with attrset literals is twofold: + +1. The keys of attribute sets may be dynamically evaluated. + + Access: + + ```nix + let + k = "foo"; + attrs = { /* etc. */ }; + in attrs."${k}" + ``` + + Literal: + ```nix + let + k = "foo"; + in { + "${k}" = 42; + } + ``` + + The problem with this is that the attribute set key is not known at + compile time, and needs to be dynamically evaluated by the VM as an + expression. + + For the most part this should be pretty simple, assuming a + theoretical instruction set: + + ``` + 0000 OP_CONSTANT(0) # key "foo" + 0001 OP_CONSTANT(1) # value 42 + 0002 OP_ATTR_SET(1) # construct attrset from 2 stack values + ``` + + The operation pushing the key needs to be replaced with one that + leaves a single value (the key) on the stack, i.e. the code for the + expression, e.g.: + + ``` + 0000..000n <operations leaving a string value on the stack> + 000n+1 OP_CONSTANT(1) # value 42 + 000n+2 OP_ATTR_SET(1) # construct attrset from 2 stack values + ``` + + This is fairly easy to do by simply recursing in the compiler when + the key expression is encountered. + +2. The keys of attribute sets may be nested. + + This is the non-trivial part of dealing with attribute set + literals. Specifically, the nesting can be arbitrarily deep and the + AST does not guarantee that related set keys are located + adjacently. + + Furthermore, this frequently occurs in practice in Nix. We need a + bytecode representation that makes it possible to construct nested + attribute sets at runtime. + + Proposal: AttrPath values + + If we can leave a value representing an attribute path on the + stack, we can offload the construction of nested attribute sets to + the `OpAttrSet` operation. + + Under the hood, OpAttrSet in practice constructs a `Map<NixString, + Value>` attribute set in most cases. This means it expects to pop + the value of the key of the stack, but is otherwise free to do + whatever it wants with the underlying map. + + In a simple example, we could have code like this: + + ```nix + { + a.b = 15; + } + ``` + + This would be compiled to a new `OpAttrPath` instruction that + constructs and pushes an attribute path from a given number of + fragments (which are popped off the stack). + + For example, + + ``` + 0000 OP_CONSTANT(0) # key "a" + 0001 OP_CONSTANT(1) # key "b" + 0002 OP_ATTR_PATH(2) # construct attrpath from 2 fragments + 0003 OP_CONSTANT(2) # value 42 + 0004 OP_ATTRS(1) # construct attrset from one pair + ``` + + Right before `0004` the stack would be left like this: + + [ AttrPath[a,b], 42 ] + + Inside of the `OP_ATTRS` instruction we could then begin + construction of the map and insert the nested attribute sets as + required, as well as validate that there are no duplicate keys. + +3. Both of these cases can occur simultaneously, but this is not a + problem as the opcodes combine perfectly fine, e.g.: + + ```nix + let + k = "a"; + in { + "${k}".b = 42; + } + ``` + + results in + + ``` + 0000..000n <operations leaving a string value on the stack> + 000n+1 OP_CONSTANT(1) # key "b" + 000n+2 OP_ATTR_PATH(2) # construct attrpath from 2 fragments + 000n+3 OP_CONSTANT(2) # value 42 + 000n+4 OP_ATTR_SET(1) # construct attrset from 2 stack values + ``` diff --git a/tvix/docs/src/eval/recursive-attrs.md b/tvix/docs/src/eval/recursive-attrs.md new file mode 100644 index 000000000000..c30cfd33e6c7 --- /dev/null +++ b/tvix/docs/src/eval/recursive-attrs.md @@ -0,0 +1,68 @@ +Recursive attribute sets +======================== + +The construction behaviour of recursive attribute sets is very +specific, and a bit peculiar. + +In essence, there are multiple "phases" of scoping that take place +during attribute set construction: + +1. Every inherited value without an explicit source is inherited only + from the **outer** scope in which the attribute set is enclosed. + +2. A new scope is opened in which all recursive keys are evaluated. + This only considers **statically known keys**, attributes can + **not** recurse into dynamic keys in `self`! + + For example, this code is invalid in C++ Nix: + + ``` + nix-repl> rec { ${"a"+""} = 2; b = a * 10; } + error: undefined variable 'a' at (string):1:26 + ``` + +3. Finally, a third scope is opened in which dynamic keys are + evaluated. + +This behaviour, while possibly a bit strange and unexpected, actually +simplifies the implementation of recursive attribute sets in Tvix as +well. + +Essentially, a recursive attribute set like this: + +```nix +rec { + inherit a; + b = a * 10; + ${"c" + ""} = b * 2; +} +``` + +Can be compiled like the following expression: + +```nix +let + inherit a; +in let + b = a * 10; + in { + inherit a b; + ${"c" + ""} = b * 2; + } +``` + +Completely deferring the resolution of recursive identifiers to the +existing handling of recursive scopes (i.e. deferred access) in let +bindings. + +In practice, we can further specialise this and compile each scope +directly into the form expected by `OpAttrs` (that is, leaving +attribute names on the stack) before each value's position. + +C++ Nix's Implementation +------------------------ + +* [`ExprAttrs`](https://github.com/NixOS/nix/blob/2097c30b08af19a9b42705fbc07463bea60dfb5b/src/libexpr/nixexpr.hh#L241-L268) + (AST representation of attribute sets) +* [`ExprAttrs::eval`](https://github.com/NixOS/nix/blob/075bf6e5565aff9fba0ea02f3333c82adf4dccee/src/libexpr/eval.cc#L1333-L1414) +* [`addAttr`](https://github.com/NixOS/nix/blob/master/src/libexpr/parser.y#L98-L156) (`ExprAttrs` construction in the parser) diff --git a/tvix/docs/src/eval/vm-loop.md b/tvix/docs/src/eval/vm-loop.md new file mode 100644 index 000000000000..6266d34709cb --- /dev/null +++ b/tvix/docs/src/eval/vm-loop.md @@ -0,0 +1,315 @@ +tvix-eval VM loop +================= + +This document describes the new tvix-eval VM execution loop implemented in the +chain focusing around cl/8104. + +## Background + +The VM loop implemented in Tvix prior to cl/8104 had several functions: + +1. Advancing the instruction pointer for a chunk of Tvix bytecode and + executing instructions in a loop until a result was yielded. + +2. Tracking Nix call frames as functions/thunks were entered/exited. + +3. Catching trampoline requests returned from instructions to force suspended + thunks without increasing stack size *where possible*. + +4. Handling trampolines through an inner trampoline loop, switching between a + code execution mode and execution of subsequent trampolines. + +This implementation of the trampoline logic was added on to the existing VM, +which previously always recursed for thunk forcing. There are some cases (for +example values that need to be forced *inside* of the execution of a builtin) +where trampolines could not previously be used, and the VM recursed anyways. + +As a result of this trampoline logic being added "on top" of the existing VM +loop the code became quite difficult to understand. This led to several bugs, +for example: b/251, b/246, b/245, and b/238. + +These bugs were tricky to deal with, as we had to try and make the VM do +things that are somewhat difficult to fit into its model. We could of course +keep extending the trampoline logic to accommodate all sorts of concepts (such +as finalisers), but that seems like it does not solve the root problem. + +## New VM loop + +In cl/8104, a unified new solution is implemented with which the VM is capable +of evaluating everything without increasing the call stack size. + +This is done by introducing a new frame stack in the VM, on which execution +frames are enqueued that are either: + +1. A bytecode frame, consisting of Tvix bytecode that evaluates compiled Nix + code. +2. A generator frame, consisting of some VM logic implemented in pure Rust + code that can be *suspended* when it hits a point where the VM would + previously need to recurse. + +We do this by making use of the `async` *keyword* in Rust, but notably +*without* introducing asynchronous I/O or concurrency in tvix-eval (the +complexity of which is currently undesirable for us). + +Specifically, when writing a Rust function that uses the `async` keyword, such +as: + +```rust +async fn some_builtin(input: Value) -> Result<Value, ErrorKind> { + let mut out = NixList::new(); + + for element in input.to_list()? { + let result = do_something_that_requires_the_vm(element).await; + out.push(result); + } + + Ok(out) +} +``` + +The compiler actually generates a state-machine under-the-hood which allows +the execution of that function to be *suspended* whenever it hits an `await`. + +We use the [`genawaiter`][] crate that gives us a data structure and simple +interface for getting instances of these state machines that can be stored in +a struct (in our case, a *generator frame*). + +The execution of the VM then becomes the execution of an *outer loop*, which +is responsible for selecting the next generator frame to execute, and two +*inner loops*, which drive the execution of a bytecode frame or generator +frame forward until it either yields a value or asks to be suspended in favour +of another frame. + +All "communication" between frames happens solely through values left on the +stack: Whenever a frame of either type runs to completion, it is expected to +leave a *single* value on the stack. It follows that the whole VM, upon +completion of the last (or initial, depending on your perspective) frame +yields its result as the return value. + +The core of the VM restructuring is cl/8104, unfortunately one of the largest +single commit changes we've had to make yet, as it touches pretty much all +areas of tvix-eval. The introduction of the generators and the +message/response system we built to request something from the VM, suspend a +generator, and wait for the return is in cl/8148. + +The next sections describe in detail how the three different loops work. + +### Outer VM loop + +The outer VM loop is responsible for selecting the next frame to run, and +dispatching it correctly to inner loops, as well as determining when to shut +down the VM and return the final result. + +``` + ╭──────────────────╮ + ╭────────┤ match frame kind ├──────╮ + │ ╰──────────────────╯ │ + │ │ + ┏━━━━━━━━━━━━┷━━━━━┓ ╭───────────┴───────────╮ +───►┃ frame_stack.pop()┃ ▼ ▼ + ┗━━━━━━━━━━━━━━━━━━┛ ┏━━━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━┓ + ▲ ┃ bytecode frame ┃ ┃ generator frame ┃ + │ ┗━━━━━━━━┯━━━━━━━┛ ┗━━━━━━━━┯━━━━━━━━┛ + │[yes, cont.] │ │ + │ ▼ ▼ + ┏━━━━━━━━┓ │ ╔════════════════╗ ╔═════════════════╗ +◄───┨ return ┃ │ ║ inner bytecode ║ ║ inner generator ║ + ┗━━━━━━━━┛ │ ║ loop ║ ║ loop ║ + ▲ │ ╚════════╤═══════╝ ╚════════╤════════╝ + │ ╭────┴─────╮ │ │ + │ │ has next │ ╰───────────┬───────────╯ + [no] ╰───┤ frame? │ │ + ╰────┬─────╯ ▼ + │ ┏━━━━━━━━━━━━━━━━━┓ + │ ┃ frame completed ┃ + ╰─────────────────────────┨ or suspended ┃ + ┗━━━━━━━━━━━━━━━━━┛ +``` + +Initially, the VM always pops a frame from the frame stack and then inspects +the type of frame it found. As a consequence the next frame to execute is +always the frame at the top of the stack, and setting up a VM initially for +code execution is done by leaving a bytecode frame with the code to execute on +the stack and passing control to the outer loop. + +Control is dispatched to either of the inner loops (depending on the type of +frame) and the cycle continues once they return. + +When an inner loop returns, it has either finished its execution (and left its +result value on the *value stack*), or its frame has requested to be +suspended. + +Frames request suspension by re-enqueueing *themselves* through VM helper +methods, and then leaving the frame they want to run *on top* of themselves in +the frame stack before yielding control back to the outer loop. + +The inner control loops inform the outer loops about whether the frame has +been *completed* or *suspended* by returning a boolean. + +### Inner bytecode loop + +The inner bytecode loop drives the execution of some Tvix bytecode by +continously looking at the next instruction to execute, and dispatching to the +instruction handler. + +``` + ┏━━━━━━━━━━━━━┓ +◄──┨ return true ┃ + ┗━━━━━━━━━━━━━┛ + ▲ + ╔════╧═════╗ + ║ OpReturn ║ + ╚══════════╝ + ▲ + ╰──┬────────────────────────────╮ + │ ▼ + │ ╔═════════════════════╗ + ┏━━━━━━━━┷━━━━━┓ ║ execute instruction ║ +───►┃ inspect next ┃ ╚══════════╤══════════╝ + ┃ instruction ┃ │ + ┗━━━━━━━━━━━━━━┛ │ + ▲ ╭─────┴─────╮ + ╰──────────────────────┤ suspends? │ + [no] ╰─────┬─────╯ + │ + │ + ┏━━━━━━━━━━━━━━┓ │ +◄──┨ return false ┃───────────────────────╯ + ┗━━━━━━━━━━━━━━┛ [yes] +``` + +With this refactoring, the compiler now emits a special `OpReturn` instruction +at the end of bytecode chunks. This is a signal to the runtime that the chunk +has completed and that its current value should be returned, without having to +perform instruction pointer arithmetic. + +When `OpReturn` is encountered, the inner bytecode loop returns control to the +outer loop and informs it (by returning `true`) that the bytecode frame has +completed. + +Any other instruction may also request a suspension of the bytecode frame (for +example, instructions that need to force a value). In this case the inner loop +is responsible for setting up the frame stack correctly, and returning `false` +to inform the outer loop of the suspension + +### Inner generator loop + +The inner generator loop is responsible for driving the execution of a +generator frame by continously calling [`Gen::resume`][] until it requests a +suspension (as a result of which control is returned to the outer loop), or +until the generator is done and yields a value. + +``` + ┏━━━━━━━━━━━━━┓ +◄──┨ return true ┃ ◄───────────────────╮ + ┗━━━━━━━━━━━━━┛ │ + │ + [Done] │ + ╭──────────────────┴─────────╮ + │ inspect generator response │◄────────────╮ + ╰──────────────────┬─────────╯ │ + [yielded] │ ┏━━━━━━━━┷━━━━━━━━┓ + │ ┃ gen.resume(msg) ┃◄── + ▼ ┗━━━━━━━━━━━━━━━━━┛ + ╭────────────╮ ▲ + │ same-frame │ │ + │ request? ├────────────────╯ + ╰─────┬──────╯ [yes] + ┏━━━━━━━━━━━━━━┓ │ +◄──┨ return false ┃ ◄──────────────────╯ + ┗━━━━━━━━━━━━━━┛ [no] +``` + +On each execution of a generator frame, `resume_with` is called with a +[`VMResponse`][] (i.e. a message *from* the VM *to* the generator). For a newly +created generator, the initial message is just `Empty`. + +A generator may then respond by signaling that it has finished execution +(`Done`), in which case the inner generator loop returns control to the outer +loop and informs it that this generator is done (by returning `true`). + +A generator may also respond by signaling that it needs some data from the VM. +This is implemented through a request-response pattern, in which the generator +returns a `Yielded` message containing a [`VMRequest`][]. These requests can be +very simple ("Tell me the current store path") or more complex ("Call this Nix +function with these values"). + +Requests are divided into two classes: Same-frame requests (requests that can be +responded to *without* returning control to the outer loop, i.e. without +executing a *different* frame), and multi-frame generator requests. Based on the +type of request, the inner generator loop will either handle it right away and +send the response in a new `resume_with` call, or return `false` to the outer +generator loop after setting up the frame stack. + +Most of this logic is implemented in cl/8148. + +[`Gen::resume`]: https://docs.rs/genawaiter/0.99.1/genawaiter/rc/struct.Gen.html#method.resume_with +[`VMRequest`]: https://cs.tvl.fyi/depot@2696839770c1ccb62929ff2575a633c07f5c9593/-/blob/tvix/eval/src/vm/generators.rs?L44 +[`VMResponse`]: https://cs.tvl.fyi/depot@2696839770c1ccb62929ff2575a633c07f5c9593/-/blob/tvix/eval/src/vm/generators.rs?L169 + +## Advantages & Disadvantages of the approach + +This approach has several advantages: + +* The execution model is much simpler than before, making it fairly + straightforward to build up a mental model of what the VM does. + +* All "out of band requests" inside the VM are handled through the same + abstraction (generators). + +* Implementation is not difficult, albeit a little verbose in some cases (we + can argue about whether or not to introduce macros for simplifying it). + +* Several parts of the VM execution are now much easier to document, + potentially letting us onboard tvix-eval contributors faster. + +* The linear VM execution itself is much easier to trace now, with for example + the `RuntimeObserver` (and by extension `tvixbolt`) giving much clearer + output now. + +But it also comes with some disadvantages: + +* Even though we "only" use the `async` keyword without a full async-I/O + runtime, we still encounter many of the drawbacks of the fragmented Rust + async ecosystem. + + The biggest issue with this is that parts of the standard library become + unavailable to us, for example the built-in `Vec::sort_by` can no longer be + used for sorting in Nix because our comparators themselves are `async`. + + This led us to having to implement some logic on our own, as the design of + `async` in Rust even makes it difficult to provide usecase-generic + implementations of concepts like sorting. + +* We need to allocate quite a few new structures on the heap in order to drive + generators, as generators involve storing `Future` types (with unknown + sizes) inside of structs. + + In initial testing this seems to make no significant difference in + performance (our performance in an actual nixpkgs-eval is still bottlenecked + by I/O concerns and reference scanning), but is something to keep in mind + later on when we start optimising more after the low-hanging fruits have + been reaped. + +## Alternatives considered + +1. Tacking on more functionality onto the existing VM loop + implementation to accomodate problems as they show up. This is not + preferred as the code is already getting messy. + +2. Making tvix-eval a fully `async` project, pulling in something like Tokio + or `async-std` as a runtime. This is not preferred due to the massively + increased complexity of those solutions, and all the known issues of fully + buying in to the async ecosystem. + + tvix-eval fundamentally should work for use-cases besides building Nix + packages (e.g. for `//tvix/serde`), and its profile should be as slim as + possible. + +3. Convincing the Rust developers that Rust needs a way to guarantee + constant-stack-depth tail calls through something like a `tailcall` + keyword. + +4. ... ? + +[`genawaiter`]: https://docs.rs/genawaiter/ |