diff options
author | Florian Klink <flokli@flokli.de> | 2024-06-13T19·17+0300 |
---|---|---|
committer | clbot <clbot@tvl.fyi> | 2024-06-14T08·00+0000 |
commit | 5077ca70deb8ca8e84abb9608e08bf4485d3ec4b (patch) | |
tree | 7396411393a5b88b25d3adb08adcd3a5478e8e21 /tvix/eval/docs/build-references.md | |
parent | 6947dc4349fa85cb702f46acfe3255c907096b12 (diff) |
chore(tvix/eval): move eval docs to tvix/docs r/8270
Change-Id: I75b33c43456389de6e521b4f0ad46d68bc9e98f6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11809 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: tazjin <tazjin@tvl.su> Tested-by: BuildkiteCI
Diffstat (limited to 'tvix/eval/docs/build-references.md')
-rw-r--r-- | tvix/eval/docs/build-references.md | 254 |
1 files changed, 0 insertions, 254 deletions
diff --git a/tvix/eval/docs/build-references.md b/tvix/eval/docs/build-references.md deleted file mode 100644 index badcea11550e..000000000000 --- a/tvix/eval/docs/build-references.md +++ /dev/null @@ -1,254 +0,0 @@ -Build references in derivations -=============================== - -This document describes how build references are calculated in Tvix. Build -references are used to determine which store paths should be available to a -builder during the execution of a build (i.e. the full build closure of a -derivation). - -## String contexts in C++ Nix - -In C++ Nix, each string value in the evaluator carries an optional so-called -"string context". - -These contexts are themselves a list of strings that take one of the following -formats: - -1. `!<output_name>!<drv_path>` - - This format describes a build reference to a specific output of a derivation. - -2. `=<drv_path>` - - This format is used for a special case where a derivation attribute directly - refers to a derivation path (e.g. by accessing `.drvPath` on a derivation). - - Note: In C++ Nix this case is quite special and actually requires a - store-database query during evaluation. - -3. `<path>` - a non-descript store path input, usually a plain source file (e.g. - from something like `src = ./.` or `src = ./foo.txt`). - - In the case of `unsafeDiscardOutputDependency` this is used to pass a raw - derivation file, but *not* pull in its outputs. - -Lets introduce names for these (in the same order) to make them easier to -reference below: - -```rust -enum BuildReference { - /// !<output_name>!<drv_path> - SingleOutput(OutputName, DrvPath), - - /// =<drv_path> - DrvClosure(DrvPath), - - /// <path> - Path(StorePath), -} -``` - -String contexts are, broadly speaking, created whenever a string is the result -of a computation (e.g. string interpolation) that used a *computed* path or -derivation in any way. - -Note: This explicitly does *not* include simply writing a literal string -containing a store path (whether valid or not). That is only permitted through -the `storePath` builtin. - -## Derivation inputs - -Based on the data above, the fields `inputDrvs` and `inputSrcs` of derivations -are populated in `builtins.derivationStrict` (the function which -`builtins.derivation`, which isn't actually a builtin, wraps). - -`inputDrvs` is represented by a map of derivation paths to the set of their -outputs that were referenced by the context. - -TODO: What happens if the set is empty? Somebody claimed this means all outputs. - -`inputSrcs` is represented by a set of paths. - -These are populated by the above references as follows: - -* `SingleOutput` entries are merged into `inputDrvs` -* `Path` entries are inserted into `inputSrcs` -* `DrvClosure` leads to a special store computation (`computeFSClosure`), which - finds all paths referenced by the derivation and then inserts all of them into - the fields as above (derivations with _all_ their outputs) - -This is then serialised in the derivation and passed down the pipe. - -## Builtins interfacing with contexts - -C++ Nix has several builtins that interface directly with string contexts: - -* `unsafeDiscardStringContext`: throws away a string's string context (if - present) -* `hasContext`: returns `true`/`false` depending on whether the string has - context -* `unsafeDiscardOutputDependency`: drops dependencies on the *outputs* of a - `.drv` in the context, passing only the literal `.drv` itself - - Note: This is only used for special test-cases in nixpkgs, and deprecated Nix - commands like `nix-push`. -* `getContext`: returns the string context in serialised form as a Nix attribute - set -* `appendContext`: adds a given string context to the string in the same format - as returned by `getContext` - -Most of the string manipulation operations will propagate the context to the -result based on their parameters' contexts. - -## Placeholders - -C++ Nix has `builtins.placeholder`, which given the name of an output (e.g. -`out`) creates a hashed string representation of that output name. If that -string is used anywhere in input attributes, the builder will replace it with -the actual name of the corresponding output of the current derivation. - -C++ Nix does not use contexts for this, it blindly creates a rewrite map of -these placeholder strings to the names of all outputs, and runs the output -replacement logic on all environment variables it creates, attribute files it -passes etc. - -## Tvix & string contexts - -In the past, Tvix did not track string contexts in its evaluator at all, see -the historical section for more information about that. - -Tvix tracks string contexts in every `NixString` structure via a -`HashSet<BuildReference>` and offers an API to combine the references while -keeping the exact internal structure of that data private. - -## Historical attempt: Persistent reference tracking - -We were investigating implementing a system which allows us to drop string -contexts in favour of reference scanning derivation attributes. - -This means that instead of maintaining and passing around a string context data -structure in eval, we maintain a data structure of *known paths* from the same -evaluation elsewhere in Tvix, and scan each derivation attribute against this -set of known paths when instantiating derivations. - -We believed we could take the stance that the system of string contexts as -implemented in C++ Nix is likely an implementation detail that should not be -leaking to the language surface as it does now. - -### Tracking "known paths" - -Every time a Tvix evaluation does something that causes a store interaction, a -"known path" is created. On the language surface, this is the result of one of: - -1. Path literals (e.g. `src = ./.`). -2. Calls to `builtins.derivationStrict` yielding a derivation and its output - paths. -3. Calls to `builtins.path`. - -Whenever one of these occurs, some metadata that persists for the duration of -one evaluation should be created in Nix. This metadata needs to be available in -`builtins.derivationStrict`, and should be able to respond to these queries: - -1. What is the set of all known paths? (used for e.g. instantiating an - Aho-Corasick type string searcher) -2. What is the _type_ of a path? (derivation path, derivation output, source - file) -3. What are the outputs of a derivation? -4. What is the derivation of an output? - -These queries will need to be asked of the metadata when populating the -derivation fields. - -Note: Depending on how we implement `builtins.placeholder`, it might be useful -to track created placeholders in this metadata, too. - -### Context builtins - -Context-reading builtins can be implemented in Tvix by adding `hasContext` and -`getContext` with the appropriate reference-scanning logic. However, we should -evaluate how these are used in nixpkgs and whether their uses can be removed. - -Context-mutating builtins can be implemented by tracking their effects in the -value representation of Tvix, however we should consider not doing this at all. - -`unsafeDiscardOutputDependency` should probably never be used and we should warn -or error on it. - -`unsafeDiscardStringContext` is often used as a workaround for avoiding IFD in -inconvenient places (e.g. in the TVL depot pipeline generation). This is -unnecessary in Tvix. We should evaluate which other uses exist, and act on them -appropriately. - -The initial danger with diverging here is that we might cause derivation hash -discrepancies between Tvix and C++ Nix, which can make initial comparisons of -derivations generated by the two systems difficult. If this occurs we need to -discuss how to approach it, but initially we will implement the mutating -builtins as no-ops. - -### Why this did not work for us? - -Nix has a feature to perform environmental checks of your derivation, e.g. -"these derivation outputs should not be referenced in this derivation", this was -introduced in Nix 2.2 by -https://github.com/NixOS/nix/commit/3cd15c5b1f5a8e6de87d5b7e8cc2f1326b420c88. - -Unfortunately, this feature introduced a very unfortunate and critical bug: all -usage of this feature with contextful strings will actually force the -derivation to depend at least at build time on those specific paths, see -https://github.com/NixOS/nix/issues/4629. - -For example, if you wanted to `disallowedReferences` to a package and you used a -derivation as a path, you would actually register that derivation as a input -derivation of that derivation. - -This bug is still unfixed in Nix and it seems that fixing it would require -introducing different ways to evaluate Nix derivations to preserve the -output path calculation for Nix expressions so far. - -All of this would be fine if the bug behavior was uniform in the sense that no -one tried to force-workaround it. Since Nixpkgs 23.05, due to -https://github.com/NixOS/nixpkgs/pull/211783 this is not true anymore. - -If you let nixpkgs be the disjoint union of bootstrapping derivations $A$ and -`stdenv.mkDerivation`-built derivations $B$. - -$A$ suffers from the bug and $B$ doesn't by the forced usage of -`unsafeDiscardStringContext` on those special checking fields. - -This means that to build hash-compatible $A$ **and** $B$, we need to -distinguish $A$ and $B$. A lot of hacks could be imagined to support this -problem. - -Let's assume we have a solution to that problem, it means that we are able to -detect implicitly when a set of specific fields are -`unsafeDiscardStringContext`-ed. - -Thus, we could use that same trick to implement `unsafeDiscardStringContext` -entirely for all fields actually. - -Now, to implement `unsafeDiscardStringContext` in the persistent reference -tracking model, you will need to store a disallowed list of strings that should -not trigger a reference when we are scanning a derivation parameters. - -But assume you have something like: - -```nix -derivation { - buildInputs = [ - stdenv.cc - ]; - - disallowedReferences = [ stdenv.cc ]; -} -``` - -If you unregister naively the `stdenv.cc` reference, it will silence the fact -that it is part of the `buildInputs`, so you will observe that Nix will fail -the derivation during environmental check, but Tvix would silently force remove -that reference. - -Until proven otherwise, it seems highly difficult to have the fine-grained -information to prevent reference tracking of those specific fields. It is not a -failure of the persistent reference tracking, it is an unresolved critical bug -of Nix that only nixpkgs really workarounded for `stdenv.mkDerivation`-based -derivations. |