about summary refs log tree commit diff
path: root/tvix/docs/src/eval/abandoned/thread-local-vm.md
diff options
context:
space:
mode:
authorFlorian Klink <flokli@flokli.de>2024-06-13T19·17+0300
committerclbot <clbot@tvl.fyi>2024-06-14T08·00+0000
commit5077ca70deb8ca8e84abb9608e08bf4485d3ec4b (patch)
tree7396411393a5b88b25d3adb08adcd3a5478e8e21 /tvix/docs/src/eval/abandoned/thread-local-vm.md
parent6947dc4349fa85cb702f46acfe3255c907096b12 (diff)
chore(tvix/eval): move eval docs to tvix/docs r/8270
Change-Id: I75b33c43456389de6e521b4f0ad46d68bc9e98f6
Reviewed-on: https://cl.tvl.fyi/c/depot/+/11809
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Diffstat (limited to 'tvix/docs/src/eval/abandoned/thread-local-vm.md')
-rw-r--r--tvix/docs/src/eval/abandoned/thread-local-vm.md233
1 files changed, 233 insertions, 0 deletions
diff --git a/tvix/docs/src/eval/abandoned/thread-local-vm.md b/tvix/docs/src/eval/abandoned/thread-local-vm.md
new file mode 100644
index 000000000000..c6a2d5e07e5c
--- /dev/null
+++ b/tvix/docs/src/eval/abandoned/thread-local-vm.md
@@ -0,0 +1,233 @@
+# We can't have nice things because IFD
+
+The thread-local VM work below was ultimately not merged because it
+was decided that it would be harmful for `tvix::eval::Value` to
+implement `Eq`, `Hash`, or any of the other `std` traits.
+
+Implementing `std` traits on `Value` was deemed harmful because IFD
+can cause arbitrary amounts of compilation to occur, including
+network transactions with builders.  Obviously it would be
+unexpected and error-prone to have a `PartialEq::eq()` which does
+something like this.  This problem does not manifest within the
+"nixpkgs compatibility only" scope, or in any undeprecated language
+feature other than IFD.  Although IFD is outside the "nixpkgs
+compatibility scope", it [has been added to the TVL compatibility
+scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).
+
+This was the sole reason for not merging.
+
+The explanation below may be useful in case future circumstances
+affect the relevance of the reasoning above.
+
+The implementation can be found in these CLs:
+
+- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)
+- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)
+- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)
+- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)
+
+# Thread-local storage for tvix::eval::vm::VM
+
+## The problem
+
+`Value::force()` takes a `&mut VM` argument, since forcing a value
+requires executing opcodes.  This means that `Value::nix_eq()` too
+must take a `&mut VM`, since any sensible definition of equality
+will have to force thunks.
+
+Unfortunately Rust's `PartialEq::eq()` function does not accept any
+additional arguments like this, so `Value` cannot implement
+`PartialEq`.  Worse, structs which *contain* `Value`s can't
+implement `PartialEq` either.  This means `Value`, and anything
+containing it, cannot be the key for a `BTreeMap` or `HashMap`.  We
+can't even insert `Value`s into a `HashSet`!
+
+There are other situations like this that don't involve `PartialEq`,
+but it's the most glaring one.  The main problem is that you need a
+`VM` in order to force thunks, and thunks can be anywhere in a
+`Value`.
+
+## Solving the problem with thread-locals
+
+We could avoid threading the `&mut VM` through the entire codebase
+by making it a thread-local.
+
+To do this without a performance hit, we need to use LLVM
+thread-locals, which are the same cost as references to `static`s
+but load relative to
+[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
+relative to the data segment.  Unfortunately `#[thread_local]` [is
+unstable][thread-local-unstable] and [unsafe in
+general][thread-local-unsafe] for most of the cases where we would
+want to use it.  There is one [exception][tls-const-init], however:
+if a `!thread_local()` has a `const` initializer, the compiler will
+insert a `#[thread_local]`; this special case is both safe and
+stable.
+
+The difficult decision is what the type of the thread-local should
+be.  Since you can't get a mutable reference to a `thread_local!()`
+it will have to be some interior-mutability-bestowing wrapper around
+our current `struct VM`.  Here are the choices:
+
+### `RefCell<VM>`
+
+This is the obvious first choice, since it lets you borrow a
+`RefMut<Target=VM>`.  The problem here is that we want to keep the
+codebase written such that all the functions in `impl VM` still take
+a `&mut self`.  This means that there will be an active mutable
+borrow for the duration of `VM::call_builtin()`.  So if we implement
+`PartialEq` by having `eq()` attempt a second mutable borrow from
+the thread-local storage, it will fail since there is already an
+active borrow.
+
+The problem here is that you can't "unborrow" a `RefMut` except by
+dropping it.  There's no way around this.
+
+#### Problem: Uglification
+
+The only solution here is to rewrite all the functions in `impl VM`
+so they don't take any kind of `self` argument, and then have them
+do a short-lived `.borrow_mut()` from the thread-local `RefCell`
+*separately, each time* they want to modify one of the fields of
+`VM` (currently `frames`, `stack`, `with_stack`, `warnings`).  This
+means that if you had a code sequence like this:
+
+```
+impl VM {
+  fn foo(&mut self, ...) {
+    ...
+    self.frame().ip += 1;
+    self.some_other_method();
+    self.frame().ip += 1;
+```
+
+You would need to add *two separate `borrow_mut()`s*, one for each
+of the `self.frame().ip+=1` statements.  You can't just do one big
+`borrow_mut()` because `some_other_method()` will call
+`borrow_mut()` and panic.
+
+#### Problem: Performance
+
+The `RefCell<VM>` approach also has a fairly huge performance hit,
+because every single modification to any part of `VM` will require a
+reference count increment/decrement, and a conditional branch based
+on the check (which will never fail) that the `RefCell` isn't
+already mutably borrowed.  It will also impede a lot of rustc's
+optimizations.
+
+### `Cell<VM>`
+
+This is a non-starter because it means that in order to mutate any
+field of `VM`, you have to move the entire `struct VM` out of the
+`Cell`, mutate it, and move it back in.
+
+### `Cell<Box<VM>>`
+
+Now we're getting warmer.  Here, we can move the `Box<VM>` out of
+the cell with a single pointer-sized memory access.
+
+We don't want to do the "uglification" described in the previous
+section.  We are very fortunate that, sometime in mid-2019, the Rust
+dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
+are bit-for-bit identical, and even gave us mortals safe wrappers
+[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
+`mem::transmute()`.
+
+So now, when a `VM` method (which takes `&mut self`) calls out to
+some external code (like a builtin), instead of passing the `&mut
+self` to the external code it can call `Cell::from_mut(&mut self)`,
+and then `Cell::swap()` that into the thread-local storage cell for
+the duration of the external code.  After the external code returns,
+it can `Cell::swap()` it back.  This whole dance gets wrapped in a
+lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
+returned by `Cell::from_mut()` lives only until the end of the
+lexical block, *so we get the `&mut self` back after the close-brace
+for that block*.  NLL FTW.  This sounds like a lot of work, but it
+should compile down to two pointer-sized loads and two pointer-sized
+stores, and it is incurred basically only for `OpBuiltin`.
+
+This all works, with only two issues:
+
+1. `vm.rs` needs to be very careful to do the thread-local cell swap
+   dance before calling anything that might call `PartialEq::eq()`
+   (or any other method that expects to be able to pull the `VM` out
+   of thread-local storage).  There is no compile-time check that we
+   did the dance in all the right places.  If we forget to do the
+   dance somewhere we'll get a runtime panic from `Option::expect()`
+   (see next section).
+
+2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
+   than a bare `VM`, we still need to rewrite all of `vm.rs` so that
+   every function takes a `&mut Box<VM>` instead of a `&mut self`.
+   This creates a huge amount of "noise" in the code.
+
+Fortunately, it turns out that nearly all the "noise" that arises
+from the second point can be eliminated by taking advantage of
+[deref coercions][deref-coercions]!  This was the last "shoe to
+drop".
+
+There is still the issue of having to be careful about calls from
+`vm.rs` to things outside that file, but it's manageable.
+
+### `Cell<Option<Box<VM>>>`
+
+In order to get the "safe and stable `#[thread_local]`"
+[exception][tls-const-init] we need a `const` initializer, which
+means we need to be able to put something into the `Cell` that isn't
+a `VM`.  So the type needs to be `Cell<Option<Box<VM>>>`.
+
+Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
+The latter type has the "is this a `Some` or `None`" bit immediately
+adjacent to the bits representing `T`.  So if I hand you a `t:&T`
+and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
+This means that all the VM methods need to operate on an
+`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
+self` "at the last minute" before inserting it into the thread-local
+storage cell.  Fortunately deref coercions save the day here too --
+the coercion is inferred through both layers (`Box` and `Option`) of
+wrapper, so there is no additional noise in the code.
+
+Note that Rust is clever and can find some sequence of bits that
+aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`.  And in fact,
+`Box<T>` is one of these cases (and this is guaranteed).  So the
+`Option` has no overhead.
+
+# Closing thoughts, language-level support
+
+This would have been easier with language-level support.
+
+## What wouldn't help
+
+Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
+T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
+thing as a `Cell<T>`, so it wouldn't be safe for the standard
+library to contain something like this:
+
+```
+impl<T> LocalKey<Cell<T>> {
+  fn get_mut(&self) -> &mut T {
+    unsafe {
+      // ... mem::transmute() voodoo goes here ...
+```
+
+The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
+end up with two `&mut T`s that point to the same thing (mutable aliasing) which
+results in undefined behavior.
+
+## What would help
+
+The ideal solution is for Rust to let you call arbitrary methods
+`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`.  This way you can
+have one (and only one) `&mut T` at any syntactical point in your
+program -- the `&mut self`.
+
+
+[tls-const-init]: https://github.com/rust-lang/rust/pull/90774
+[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594
+[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366
+[fiat-decree]: https://github.com/rust-lang/rust/issues/43038
+[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut
+[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut
+[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]
+[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods
+[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic