From 6f13c16f28de01209c92e9b9415b68b00b913894 Mon Sep 17 00:00:00 2001 From: Vincent Ambo Date: Tue, 9 Aug 2022 16:56:59 +0300 Subject: docs(tvix/eval): add design documentation for attrset opcodes Change-Id: I303b57e035543f4597c6247983d1d533e4014638 Reviewed-on: https://cl.tvl.fyi/c/depot/+/6092 Tested-by: BuildkiteCI Reviewed-by: grfn --- tvix/eval/docs/opcodes-attrsets.md | 122 +++++++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 tvix/eval/docs/opcodes-attrsets.md diff --git a/tvix/eval/docs/opcodes-attrsets.md b/tvix/eval/docs/opcodes-attrsets.md new file mode 100644 index 0000000000..7026f3319d --- /dev/null +++ b/tvix/eval/docs/opcodes-attrsets.md @@ -0,0 +1,122 @@ +# attrset-opcodes + +The problem with attrset literals is twofold: + +1. The keys of attribute sets may be dynamically evaluated. + + Access: + + ```nix + let + k = "foo"; + attrs = { /* etc. */ }; + in attrs."${k}" + ``` + + Literal: + ```nix + let + k = "foo"; + in { + "${k}" = 42; + } + ``` + + The problem with this is that the attribute set key is not known at + compile time, and needs to be dynamically evaluated by the VM as an + expression. + + For the most part this should be pretty simple, assuming a + theoretical instruction set: + + ``` + 0000 OP_CONSTANT(0) # key "foo" + 0001 OP_CONSTANT(1) # value 42 + 0002 OP_ATTR_SET(1) # construct attrset from 2 stack values + ``` + + The operation pushing the key needs to be replaced with one that + leaves a single value (the key) on the stack, i.e. the code for the + expression, e.g.: + + ``` + 0000..000n + 000n+1 OP_CONSTANT(1) # value 42 + 000n+2 OP_ATTR_SET(1) # construct attrset from 2 stack values + ``` + + This is fairly easy to do by simply recursing in the compiler when + the key expression is encountered. + +2. The keys of attribute sets may be nested. + + This is the non-trivial part of dealing with attribute set + literals. Specifically, the nesting can be arbitrarily deep and the + AST does not guarantee that related set keys are located + adjacently. + + Furthermore, this frequently occurs in practice in Nix. We need a + bytecode representation that makes it possible to construct nested + attribute sets at runtime. + + Proposal: AttrPath values + + If we can leave a value representing an attribute path on the + stack, we can offload the construction of nested attribute sets to + the `OpAttrSet` operation. + + Under the hood, OpAttrSet in practice constructs a `Map` attribute set in most cases. This means it expects to pop + the value of the key of the stack, but is otherwise free to do + whatever it wants with the underlying map. + + In a simple example, we could have code like this: + + ```nix + { + a.b = 15; + } + ``` + + This would be compiled to a new `OpAttrPath` instruction that + constructs and pushes an attribute path from a given number of + fragments (which are popped off the stack). + + For example, + + ``` + 0000 OP_CONSTANT(0) # key "a" + 0001 OP_CONSTANT(1) # key "b" + 0002 OP_ATTR_PATH(2) # construct attrpath from 2 fragments + 0003 OP_CONSTANT(2) # value 42 + 0004 OP_ATTRS(1) # construct attrset from one pair + ``` + + Right before `0004` the stack would be left like this: + + [ AttrPath[a,b], 42 ] + + Inside of the `OP_ATTRS` instruction we could then begin + construction of the map and insert the nested attribute sets as + required, as well as validate that there are no duplicate keys. + +3. Both of these cases can occur simultaneously, but this is not a + problem as the opcodes combine perfectly fine, e.g.: + + ```nix + let + k = "a"; + in { + "${k}".b = 42; + } + ``` + + results in + + ``` + 0000..000n + 000n+1 OP_CONSTANT(1) # key "b" + 000n+2 OP_ATTR_PATH(2) # construct attrpath from 2 fragments + 000n+3 OP_CONSTANT(2) # value 42 + 000n+4 OP_ATTR_SET(1) # construct attrset from 2 stack values + ``` -- cgit 1.4.1