# We can't have nice things because IFD The thread-local VM work below was ultimately not merged because it was decided that it would be harmful for `tvix::eval::Value` to implement `Eq`, `Hash`, or any of the other `std` traits. Implementing `std` traits on `Value` was deemed harmful because IFD can cause arbitrary amounts of compilation to occur, including network transactions with builders. Obviously it would be unexpected and error-prone to have a `PartialEq::eq()` which does something like this. This problem does not manifest within the "nixpkgs compatibility only" scope, or in any undeprecated language feature other than IFD. Although IFD is outside the "nixpkgs compatibility scope", it [has been added to the TVL compatibility scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/). This was the sole reason for not merging. The explanation below may be useful in case future circumstances affect the relevance of the reasoning above. The implementation can be found in these CLs: - [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194) - [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195) - [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196) - [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197) # Thread-local storage for tvix::eval::vm::VM ## The problem `Value::force()` takes a `&mut VM` argument, since forcing a value requires executing opcodes. This means that `Value::nix_eq()` too must take a `&mut VM`, since any sensible definition of equality will have to force thunks. Unfortunately Rust's `PartialEq::eq()` function does not accept any additional arguments like this, so `Value` cannot implement `PartialEq`. Worse, structs which *contain* `Value`s can't implement `PartialEq` either. This means `Value`, and anything containing it, cannot be the key for a `BTreeMap` or `HashMap`. We can't even insert `Value`s into a `HashSet`! There are other situations like this that don't involve `PartialEq`, but it's the most glaring one. The main problem is that you need a `VM` in order to force thunks, and thunks can be anywhere in a `Value`. ## Solving the problem with thread-locals We could avoid threading the `&mut VM` through the entire codebase by making it a thread-local. To do this without a performance hit, we need to use LLVM thread-locals, which are the same cost as references to `static`s but load relative to [`llvm.threadlocal.address`][threadlocal-intrinsic] instead of relative to the data segment. Unfortunately `#[thread_local]` [is unstable][thread-local-unstable] and [unsafe in general][thread-local-unsafe] for most of the cases where we would want to use it. There is one [exception][tls-const-init], however: if a `!thread_local()` has a `const` initializer, the compiler will insert a `#[thread_local]`; this special case is both safe and stable. The difficult decision is what the type of the thread-local should be. Since you can't get a mutable reference to a `thread_local!()` it will have to be some interior-mutability-bestowing wrapper around our current `struct VM`. Here are the choices: ### `RefCell` This is the obvious first choice, since it lets you borrow a `RefMut`. The problem here is that we want to keep the codebase written such that all the functions in `impl VM` still take a `&mut self`. This means that there will be an active mutable borrow for the duration of `VM::call_builtin()`. So if we implement `PartialEq` by having `eq()` attempt a second mutable borrow from the thread-local storage, it will fail since there is already an active borrow. The problem here is that you can't "unborrow" a `RefMut` except by dropping it. There's no way around this. #### Problem: Uglification The only solution here is to rewrite all the functions in `impl VM` so they don't take any kind of `self` argument, and then have them do a short-lived `.borrow_mut()` from the thread-local `RefCell` *separately, each time* they want to modify one of the fields of `VM` (currently `frames`, `stack`, `with_stack`, `warnings`). This means that if you had a code sequence like this: ``` impl VM { fn foo(&mut self, ...) { ... self.frame().ip += 1; self.some_other_method(); self.frame().ip += 1; ``` You would need to add *two separate `borrow_mut()`s*, one for each of the `self.frame().ip+=1` statements. You can't just do one big `borrow_mut()` because `some_other_method()` will call `borrow_mut()` and panic. #### Problem: Performance The `RefCell` approach also has a fairly huge performance hit, because every single modification to any part of `VM` will require a reference count increment/decrement, and a conditional branch based on the check (which will never fail) that the `RefCell` isn't already mutably borrowed. It will also impede a lot of rustc's optimizations. ### `Cell` This is a non-starter because it means that in order to mutate any field of `VM`, you have to move the entire `struct VM` out of the `Cell`, mutate it, and move it back in. ### `Cell>` Now we're getting warmer. Here, we can move the `Box` out of the cell with a single pointer-sized memory access. We don't want to do the "uglification" described in the previous section. We are very fortunate that, sometime in mid-2019, the Rust dieties [decreed by fiat][fiat-decree] that `&Cell` and `&mut T` are bit-for-bit identical, and even gave us mortals safe wrappers [`from_mut()`][from_mut] and [`get_mut()`][get_mut] around `mem::transmute()`. So now, when a `VM` method (which takes `&mut self`) calls out to some external code (like a builtin), instead of passing the `&mut self` to the external code it can call `Cell::from_mut(&mut self)`, and then `Cell::swap()` that into the thread-local storage cell for the duration of the external code. After the external code returns, it can `Cell::swap()` it back. This whole dance gets wrapped in a lexical block, and the borrow checker sees that the `&Cell>` returned by `Cell::from_mut()` lives only until the end of the lexical block, *so we get the `&mut self` back after the close-brace for that block*. NLL FTW. This sounds like a lot of work, but it should compile down to two pointer-sized loads and two pointer-sized stores, and it is incurred basically only for `OpBuiltin`. This all works, with only two issues: 1. `vm.rs` needs to be very careful to do the thread-local cell swap dance before calling anything that might call `PartialEq::eq()` (or any other method that expects to be able to pull the `VM` out of thread-local storage). There is no compile-time check that we did the dance in all the right places. If we forget to do the dance somewhere we'll get a runtime panic from `Option::expect()` (see next section). 2. Since we need to call `Cell::from_mut()` on a `Box` rather than a bare `VM`, we still need to rewrite all of `vm.rs` so that every function takes a `&mut Box` instead of a `&mut self`. This creates a huge amount of "noise" in the code. Fortunately, it turns out that nearly all the "noise" that arises from the second point can be eliminated by taking advantage of [deref coercions][deref-coercions]! This was the last "shoe to drop". There is still the issue of having to be careful about calls from `vm.rs` to things outside that file, but it's manageable. ### `Cell>>` In order to get the "safe and stable `#[thread_local]`" [exception][tls-const-init] we need a `const` initializer, which means we need to be able to put something into the `Cell` that isn't a `VM`. So the type needs to be `Cell>>`. Recall that you can't turn an `Option<&T>` into an `&Option`. The latter type has the "is this a `Some` or `None`" bit immediately adjacent to the bits representing `T`. So if I hand you a `t:&T` and you wrap it as `Some(t)`, those bits aren't adjacent in memory. This means that all the VM methods need to operate on an `Option>` -- we can't just wrap a `Some()` around `&mut self` "at the last minute" before inserting it into the thread-local storage cell. Fortunately deref coercions save the day here too -- the coercion is inferred through both layers (`Box` and `Option`) of wrapper, so there is no additional noise in the code. Note that Rust is clever and can find some sequence of bits that aren't a valid `T`, so `sizeof(Option)==sizeof(T)`. And in fact, `Box` is one of these cases (and this is guaranteed). So the `Option` has no overhead. # Closing thoughts, language-level support This would have been easier with language-level support. ## What wouldn't help Although it [it was decreed][fiat-decree] that `Cell` and `&mut T` are interchangeable, a `LocalKey>` isn't quite the same thing as a `Cell`, so it wouldn't be safe for the standard library to contain something like this: ``` impl LocalKey> { fn get_mut(&self) -> &mut T { unsafe { // ... mem::transmute() voodoo goes here ... ``` The problem here is that you can call `LocalKey>::get_mut()` twice and end up with two `&mut T`s that point to the same thing (mutable aliasing) which results in undefined behavior. ## What would help The ideal solution is for Rust to let you call arbitrary methods `T::foo(&mut self...)` on a `LocalKey>`. This way you can have one (and only one) `&mut T` at any syntactical point in your program -- the `&mut self`. [tls-const-init]: https://github.com/rust-lang/rust/pull/90774 [thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594 [thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366 [fiat-decree]: https://github.com/rust-lang/rust/issues/43038 [from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut [get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut [thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366] [deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods [threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic