diff options
Diffstat (limited to 'tvix/docs/src/eval/known-optimisation-potential.md')
-rw-r--r-- | tvix/docs/src/eval/known-optimisation-potential.md | 162 |
1 files changed, 162 insertions, 0 deletions
diff --git a/tvix/docs/src/eval/known-optimisation-potential.md b/tvix/docs/src/eval/known-optimisation-potential.md new file mode 100644 index 000000000000..0ab185fe1be2 --- /dev/null +++ b/tvix/docs/src/eval/known-optimisation-potential.md @@ -0,0 +1,162 @@ +Known Optimisation Potential +============================ + +There are several areas of the Tvix evaluator code base where +potentially large performance gains can be achieved through +optimisations that we are already aware of. + +The shape of most optimisations is that of moving more work into the +compiler to simplify the runtime execution of Nix code. This leads, in +some cases, to drastically higher complexity in both the compiler +itself and in invariants that need to be guaranteed between the +runtime and the compiler. + +For this reason, and because we lack the infrastructure to adequately +track their impact (WIP), we have not yet implemented these +optimisations, but note the most important ones here. + +* Use "open upvalues" [hard] + + Right now, Tvix will immediately close over all upvalues that are + created and clone them into the `Closure::upvalues` array. + + Instead of doing this, we can statically determine most locals that + are closed over *and escape their scope* (similar to how the + `compiler::scope::Scope` struct currently tracks whether locals are + used at all). + + If we implement the machinery to track this, we can implement some + upvalues at runtime by simply sticking stack indices in the upvalue + array and only copy the values where we know that they escape. + +* Avoid `with` value duplication [easy] + + If a `with` makes use of a local identifier in a scope that can not + close before the with (e.g. not across `LambdaCtx` boundaries), we + can avoid the allocation of the phantom value and duplication of the + `NixAttrs` value on the stack. In this case we simply push the stack + index of the known local. + +* Multiple attribute selection [medium] + + An instruction could be introduced that avoids repeatedly pushing an + attribute set to/from the stack if multiple keys are being selected + from it. This occurs, for example, when inheriting from an attribute + set or when binding function formals. + +* Split closure/function representation [easy] + + Functions have fewer fields that need to be populated at runtime and + can directly use the `value::function::Lambda` representation where + possible. + +* Apply `compiler::optimise_select` to other set operations [medium] + + In addition to selects, statically known attribute resolution could + also be used for things like `?` or `with`. The latter might be a + little more complicated but is worth investigating. + +* Inline fully applied builtins with equivalent operators [medium] + + Some `builtins` have equivalent operators, e.g. `builtins.sub` + corresponds to the `-` operator, `builtins.hasAttr` to the `?` + operator etc. These operators additionally compile to a primitive + VM opcode, so they should be just as cheap (if not cheaper) as + a builtin application. + + In case the compiler encounters a fully applied builtin (i.e. + no currying is occurring) and the `builtins` global is unshadowed, + it could compile the equivalent operator bytecode instead: For + example, `builtins.sub 20 22` would be compiled as `20 - 22`. + This would ensure that equivalent `builtins` can also benefit + from special optimisations we may implement for certain operators + (in the absence of currying). E.g. we could optimise access + to the `builtins` attribute set which a call to + `builtins.getAttr "foo" builtins` should also profit from. + +* Avoid nested `VM::run` calls [hard] + + Currently when encountering Nix-native callables (thunks, closures) + the VM's run loop will nest and return the value of the nested call + frame one level up. This makes the Rust call stack almost mirror the + Nix call stack, which is usually undesirable. + + It is possible to detect situations where this is avoidable and + instead set up the VM in such a way that it continues and produces + the desired result in the same run loop, but this is kind of tricky + to get right - especially while other parts are still in flux. + + For details consult the commit with Gerrit change ID + `I96828ab6a628136e0bac1bf03555faa4e6b74ece`, in which the initial + attempt at doing this was reverted. + +* Avoid thunks if only identifier closing is required [medium] + + Some constructs, like `with`, mostly do not change runtime behaviour + if thunked. However, they are wrapped in thunks to ensure that + deferred identifiers are resolved correctly. + + This can be avoided, as we statically analyse the scope and should + be able to tell whether any such logic was required. + +* Intern literals [easy] + + Currently, the compiler emits a separate entry in the constant + table for each literal. So the program `1 + 1 + 1` will have + three entries in its `Chunk::constants` instead of only one. + +* Do some list and attribute set operations in place [hard] + + Algorithms that can not do a lot of work inside `builtins` like `map`, + `filter` or `foldl'` usually perform terribly if they use data structures like + lists and attribute sets. + + `builtins` can do work in place on a copy of a `Value`, but naïvely expressed + recursive algorithms will usually use `//` and `++` to do a single change to a + `Value` at a time, requiring a full copy of the data structure each time. + It would be a big improvement if we could do some of these operations in place + without requiring a new copy. + + There are probably two approaches: We could determine statically if a value is + reachable from elsewhere and emit a special in place instruction if not. An + easier alternative is probably to rely on reference counting at runtime: If no + other reference to a value exists, we can extend the list or update the + attribute set in place. + + An **alternative** to this is using [persistent data + structures](https://en.wikipedia.org/wiki/Persistent_data_structure) or at the + very least [immutable data structures](https://docs.rs/im/latest/im/) that can + be copied more efficiently than the stock structures we are using at the + moment. + +* Skip finalising unfinalised thunks or non-thunks instead of crashing [easy] + + Currently `OpFinalise` crashes the VM if it is called on values that don't + need to be finalised. This helps catching miscompilations where `OpFinalise` + operates on the wrong `StackIdx`. In the case of function argument patterns, + however, this means extra VM stack and instruction overhead for dynamically + determining if finalisation is necessary or not. This wouldn't be necessary + if `OpFinalise` would just noop on any values that don't need to be finalised + (anymore). + +* Phantom binding for from expression of inherits [easy] + + The from expression of an inherit is reevaluated for each inherit. This can + be demonstrated using the following Nix expression which, counter-intuitively, + will print “plonk” twice. + + ```nix + let + inherit (builtins.trace "plonk" { a = null; b = null; }) a b; + in + builtins.seq a (builtins.seq b null) + ``` + + In most Nix code, the from expression is just an identifier, so it is not + terribly inefficient, but in some cases a more expensive expression may + be used. We should create a phantom binding for the from expression that + is reused in the inherits, so only a single thunk is created for the from + expression. + + Since we discovered this, C++ Nix has implemented a similar optimization: + <https://github.com/NixOS/nix/pull/9847>. |