about summary refs log tree commit diff
path: root/tvix/docs/src/eval/abandoned/thread-local-vm.md
blob: c6a2d5e07e5ca61ae857b67ac697697f47b7d8d4 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# We can't have nice things because IFD

The thread-local VM work below was ultimately not merged because it
was decided that it would be harmful for `tvix::eval::Value` to
implement `Eq`, `Hash`, or any of the other `std` traits.

Implementing `std` traits on `Value` was deemed harmful because IFD
can cause arbitrary amounts of compilation to occur, including
network transactions with builders.  Obviously it would be
unexpected and error-prone to have a `PartialEq::eq()` which does
something like this.  This problem does not manifest within the
"nixpkgs compatibility only" scope, or in any undeprecated language
feature other than IFD.  Although IFD is outside the "nixpkgs
compatibility scope", it [has been added to the TVL compatibility
scope](https://cl.tvl.fyi/c/depot/+/7193/comment/3418997b_0dbd0b65/).

This was the sole reason for not merging.

The explanation below may be useful in case future circumstances
affect the relevance of the reasoning above.

The implementation can be found in these CLs:

- [refactor(tvix/eval): remove lifetime parameter from VM<'o>](https://cl.tvl.fyi/c/depot/+/7194)
- [feat(tvix/eval): [FOUNDLING] thread-local VM](https://cl.tvl.fyi/c/depot/+/7195)
- [feat(tvix/eval): [FOUNDLING] VM::vm_xxx convenience methods](https://cl.tvl.fyi/c/depot/+/7196)
- [refactor(tvix/eval): [FOUNDLING]: drop explicit `&mut vm` parameter](https://cl.tvl.fyi/c/depot/+/7197)

# Thread-local storage for tvix::eval::vm::VM

## The problem

`Value::force()` takes a `&mut VM` argument, since forcing a value
requires executing opcodes.  This means that `Value::nix_eq()` too
must take a `&mut VM`, since any sensible definition of equality
will have to force thunks.

Unfortunately Rust's `PartialEq::eq()` function does not accept any
additional arguments like this, so `Value` cannot implement
`PartialEq`.  Worse, structs which *contain* `Value`s can't
implement `PartialEq` either.  This means `Value`, and anything
containing it, cannot be the key for a `BTreeMap` or `HashMap`.  We
can't even insert `Value`s into a `HashSet`!

There are other situations like this that don't involve `PartialEq`,
but it's the most glaring one.  The main problem is that you need a
`VM` in order to force thunks, and thunks can be anywhere in a
`Value`.

## Solving the problem with thread-locals

We could avoid threading the `&mut VM` through the entire codebase
by making it a thread-local.

To do this without a performance hit, we need to use LLVM
thread-locals, which are the same cost as references to `static`s
but load relative to
[`llvm.threadlocal.address`][threadlocal-intrinsic] instead of
relative to the data segment.  Unfortunately `#[thread_local]` [is
unstable][thread-local-unstable] and [unsafe in
general][thread-local-unsafe] for most of the cases where we would
want to use it.  There is one [exception][tls-const-init], however:
if a `!thread_local()` has a `const` initializer, the compiler will
insert a `#[thread_local]`; this special case is both safe and
stable.

The difficult decision is what the type of the thread-local should
be.  Since you can't get a mutable reference to a `thread_local!()`
it will have to be some interior-mutability-bestowing wrapper around
our current `struct VM`.  Here are the choices:

### `RefCell<VM>`

This is the obvious first choice, since it lets you borrow a
`RefMut<Target=VM>`.  The problem here is that we want to keep the
codebase written such that all the functions in `impl VM` still take
a `&mut self`.  This means that there will be an active mutable
borrow for the duration of `VM::call_builtin()`.  So if we implement
`PartialEq` by having `eq()` attempt a second mutable borrow from
the thread-local storage, it will fail since there is already an
active borrow.

The problem here is that you can't "unborrow" a `RefMut` except by
dropping it.  There's no way around this.

#### Problem: Uglification

The only solution here is to rewrite all the functions in `impl VM`
so they don't take any kind of `self` argument, and then have them
do a short-lived `.borrow_mut()` from the thread-local `RefCell`
*separately, each time* they want to modify one of the fields of
`VM` (currently `frames`, `stack`, `with_stack`, `warnings`).  This
means that if you had a code sequence like this:

```
impl VM {
  fn foo(&mut self, ...) {
    ...
    self.frame().ip += 1;
    self.some_other_method();
    self.frame().ip += 1;
```

You would need to add *two separate `borrow_mut()`s*, one for each
of the `self.frame().ip+=1` statements.  You can't just do one big
`borrow_mut()` because `some_other_method()` will call
`borrow_mut()` and panic.

#### Problem: Performance

The `RefCell<VM>` approach also has a fairly huge performance hit,
because every single modification to any part of `VM` will require a
reference count increment/decrement, and a conditional branch based
on the check (which will never fail) that the `RefCell` isn't
already mutably borrowed.  It will also impede a lot of rustc's
optimizations.

### `Cell<VM>`

This is a non-starter because it means that in order to mutate any
field of `VM`, you have to move the entire `struct VM` out of the
`Cell`, mutate it, and move it back in.

### `Cell<Box<VM>>`

Now we're getting warmer.  Here, we can move the `Box<VM>` out of
the cell with a single pointer-sized memory access.

We don't want to do the "uglification" described in the previous
section.  We are very fortunate that, sometime in mid-2019, the Rust
dieties [decreed by fiat][fiat-decree] that `&Cell<T>` and `&mut T`
are bit-for-bit identical, and even gave us mortals safe wrappers
[`from_mut()`][from_mut] and [`get_mut()`][get_mut] around
`mem::transmute()`.

So now, when a `VM` method (which takes `&mut self`) calls out to
some external code (like a builtin), instead of passing the `&mut
self` to the external code it can call `Cell::from_mut(&mut self)`,
and then `Cell::swap()` that into the thread-local storage cell for
the duration of the external code.  After the external code returns,
it can `Cell::swap()` it back.  This whole dance gets wrapped in a
lexical block, and the borrow checker sees that the `&Cell<Box<VM>>`
returned by `Cell::from_mut()` lives only until the end of the
lexical block, *so we get the `&mut self` back after the close-brace
for that block*.  NLL FTW.  This sounds like a lot of work, but it
should compile down to two pointer-sized loads and two pointer-sized
stores, and it is incurred basically only for `OpBuiltin`.

This all works, with only two issues:

1. `vm.rs` needs to be very careful to do the thread-local cell swap
   dance before calling anything that might call `PartialEq::eq()`
   (or any other method that expects to be able to pull the `VM` out
   of thread-local storage).  There is no compile-time check that we
   did the dance in all the right places.  If we forget to do the
   dance somewhere we'll get a runtime panic from `Option::expect()`
   (see next section).

2. Since we need to call `Cell::from_mut()` on a `Box<VM>` rather
   than a bare `VM`, we still need to rewrite all of `vm.rs` so that
   every function takes a `&mut Box<VM>` instead of a `&mut self`.
   This creates a huge amount of "noise" in the code.

Fortunately, it turns out that nearly all the "noise" that arises
from the second point can be eliminated by taking advantage of
[deref coercions][deref-coercions]!  This was the last "shoe to
drop".

There is still the issue of having to be careful about calls from
`vm.rs` to things outside that file, but it's manageable.

### `Cell<Option<Box<VM>>>`

In order to get the "safe and stable `#[thread_local]`"
[exception][tls-const-init] we need a `const` initializer, which
means we need to be able to put something into the `Cell` that isn't
a `VM`.  So the type needs to be `Cell<Option<Box<VM>>>`.

Recall that you can't turn an `Option<&T>` into an `&Option<T>`.
The latter type has the "is this a `Some` or `None`" bit immediately
adjacent to the bits representing `T`.  So if I hand you a `t:&T`
and you wrap it as `Some(t)`, those bits aren't adjacent in memory.
This means that all the VM methods need to operate on an
`Option<Box<VM>>` -- we can't just wrap a `Some()` around `&mut
self` "at the last minute" before inserting it into the thread-local
storage cell.  Fortunately deref coercions save the day here too --
the coercion is inferred through both layers (`Box` and `Option`) of
wrapper, so there is no additional noise in the code.

Note that Rust is clever and can find some sequence of bits that
aren't a valid `T`, so `sizeof(Option<T>)==sizeof(T)`.  And in fact,
`Box<T>` is one of these cases (and this is guaranteed).  So the
`Option` has no overhead.

# Closing thoughts, language-level support

This would have been easier with language-level support.

## What wouldn't help

Although it [it was decreed][fiat-decree] that `Cell<T>` and `&mut
T` are interchangeable, a `LocalKey<Cell<T>>` isn't quite the same
thing as a `Cell<T>`, so it wouldn't be safe for the standard
library to contain something like this:

```
impl<T> LocalKey<Cell<T>> {
  fn get_mut(&self) -> &mut T {
    unsafe {
      // ... mem::transmute() voodoo goes here ...
```

The problem here is that you can call `LocalKey<Cell<T>>::get_mut()` twice and
end up with two `&mut T`s that point to the same thing (mutable aliasing) which
results in undefined behavior.

## What would help

The ideal solution is for Rust to let you call arbitrary methods
`T::foo(&mut self...)` on a `LocalKey<Cell<T>>`.  This way you can
have one (and only one) `&mut T` at any syntactical point in your
program -- the `&mut self`.


[tls-const-init]: https://github.com/rust-lang/rust/pull/90774
[thread-local-unstable]: https://github.com/rust-lang/rust/issues/29594
[thread-local-unsafe-generally]: https://github.com/rust-lang/rust/issues/54366
[fiat-decree]: https://github.com/rust-lang/rust/issues/43038
[from_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.from_mut
[get_mut]: https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#method.get_mut
[thread-local-unsafe]: [https://github.com/rust-lang/rust/issues/54366]
[deref-coercions]: https://doc.rust-lang.org/book/ch15-02-deref.html#implicit-deref-coercions-with-functions-and-methods
[threadlocal-intrinsic]: https://llvm.org/docs/LangRef.html#llvm-threadlocal-address-intrinsic