about summary refs log tree commit diff
path: root/web/tvl/blog/2024-08-tvix-update.md
blob: 5fc15c02d164f4cf188a364bb4b1c899ff47bf48 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
It's already been around half a year since
[the last Tvix update][2024-02-tvix-update], so time for another one!

Note: This blog post is intended for a technical audience that is already
intimately familiar with Nix, and knows what things like derivations or store
paths are. If you're new to Nix, this will not make a lot of sense to you!

## Builds
A long-term goal is obviously to be able to use the expressions in nixpkgs to
build things with Tvix. We made progress on many places towards that goal:

### Drive builds on IO
As already explained in our [first blog post][blog-rewriting-nix], in Tvix, we
want to make IFD a first-class citizen without significant perf cost.

Nix tries hard to split Evaluation and Building into two phases, visible in
the `nix-instantiate` command which produces `.drv` files in `/nix/store` and
the `nix-build` command which can be invoked on such `.drv` files without
evaluation.
Scheduling (like in Hydra) usually happens by walking the graph of `.drv` files
produced in the first phase.

As soon as there's some IFD along the path, everything until then gets built in
the Evaluator (which is why IFD is prohibited in nixpkgs).

Tvix does not have two separate "phases" in a build, only a graph of unfinished
Derivations/Builds and their associated store paths. This graph does not need
to be written to disk, and can grow during runtime, as new Derivations with new
output paths are discovered.

Build scheduling happens continuously with that graph, for everything that's
really needed, when it's needed.

We do this by only "forcing" the realization of a specific store path if the
user ultimately wants that specific result to be available on their system, and
transitively, if something else wants it. This includes IFD in a very elegant
way.

We want to play with this approach as we continue on bringing our build
infrastructure up.

### Fetchers
There's a few Nix builtins that allow describing a fetch (be it download of a
file from the internet, clone of a git repo). These needed to be implemented
for completeness. We implemented pretty much all downloads of Tarballs, NARs and
plain files, except git repositories, which are left for later.

Instead of doing these fetches immediately, we added a generic `Fetch` type
that allows describing such fetches *before actually doing them*, similar to
being able to describe builds, and use the same "Drive builds on IO" machinery
to delay these fetches to the point where it's needed. We also show progress
bars when doing fetches.

Very early, during bootstrapping, nixpkgs relies on some `builtin:fetchurl`
"fake" Derivation, which has some special handling logic in Nix. We implemented
these quirks, by converting it to instances of our `Fetch` type and dealing with
it there in a consistent fashion.

### More fixes, Refscan
With the above work done, and after fixing some small bugs [^3], we were already
able to build some first few store paths with Tvix and our `runc`-based builder
🎉!

We didn't get too far though, as we still need to implement reference scanning,
so that's next on our TODO list for here. Stay tuned for further updates there!

## Eval correctness & Performance
As already written in the previous update, we've been evaluating parts of
`nixpkgs` and ensuring we produce the same derivations. We managed to find and
fix some correctness issues there.

Even though we don't want to focus too much on performance improvements
until all features of Nix are properly understood and representable with our
architecture, there's been some work on removing some obvious and low-risk
performance bottlenecks. Expect a detailed blog post around that soon after
this one!

## Tracing / O11Y Support
Tvix got support for Tracing, and is able to emit spans in
[OpenTelemetry][opentelemetry]-compatible format.

This means, if the necessary tooling is set up to collect such spans [^1], it's
possible to see what's happening inside the different components of Tvix across
process (and machine) boundaries.

Tvix now also propagates trace IDs via gRPC and HTTP requests [^2], and
continues them if receiving such ones.

As an example, this allows us to get "callgraphs" on how a tvix-store operation
is processed through a multi-node deployment, and find bottlenecks and places to
optimize performance for.

Currently, this is compiled in by default, trying to send traces to an endpoint
at `localhost` (as per the official [SDK defaults][otlp-sdk]). It can
be disabled by building without the `otlp` feature, or running with the
`--otlp=false` CLI flag.

This piggy-backs on the excellent [tracing][tracing-rs] crate, which we already
use for structured logging, so while at it, we improved some log messages and
fields to make it easier to filter for certain types of events.

We also added support for sending out [Tracy][tracy] traces, though these are
disabled by default.

Additionally, some CLI entrypoints can now report progress to the user!
For example, when we're fetching something during evaluation
(via `builtins.fetchurl`), or uploading store path contents, we can report on
this. See [here][asciinema-import] for an example.

We're still considering these outputs as early prototypes, and will refine them as
we go.

## tvix-castore ingestion generalization
We spent some time refactoring and generalizing tvix-castore importer code.

It's now generalized on a stream of "ingestion entries" produced in a certain
order, and there's various producers of this stream (reading through the local
filesystem, reading through a NAR, reading through a tarball, soon: traversing
contents of a git repo, …).

This prevented a lot of code duplication for these various formats, and allows
pulling out helper code for concurrent blob uploading.

## More tvix-[ca]store backends
We added some more store backends to Tvix:

 - There's a [redb][redb] `PathInfoService` and `DirectoryService`, which
   also replaced the previous `sled` default backend.
 - There's a [bigtable][bigtable] `PathInfoService` and `DirectoryService`
   backend.
 - The "simplefs" `BlobService` has been removed, as it can be expressed using
   the "objectstore" backend with a `file://` URI.
 - There's been some work on feature-flagging certain backends.

## Documentation reconcilation
Various bits and pieces of documentation have previously been scattered
throughout the Tvix codebase, which wasn't very accessible and quite confusing.

These have been consolidated into a mdbook (at `//tvix/docs`).

We plan to properly host these as a website, hopefully providing a better introduction
and overview of Tvix, while adding more content over time.

## `nar-bridge` RIIR
While the golang implementation of `nar-bridge` did serve us well for a while,
it being the only remaining non-Rust part was a bit annoying.

Adding some features there meant they would not be accessible in the rest of
Tvix - and the other way round.
Also, we could not open data stores directly from there, but always had to start
a separate `tvix-store daemon`.

The initial plans for the Rust rewrite were already made quite a while ago,
but we finally managed to finish implementing the remaining bits. `nar-bridge`
is now fully written in Rust, providing the same CLI experience features and
store backends as the rest of Tvix.

## `crate2nix` and overall rust Nix improvements
We landed some fixes in [crate2nix][crate2nix], the tool we're using to for
per-crate incremental builds of Tvix.

It now supports the corner cases needed to build WASM - so now
[Tvixbolt][tvixbolt] is built with it, too.

We also fixed some bugs in how test directories are prepared, which unlocked
running some more tests for filesystem related builtins such as `readDir` in our test suite.

Additionally, there has been some general improvements around ensuring various
combinations of Tvix feature flags build (now continuously checked by CI), and
reducing the amount of unnecessary rebuilds, by filtering non-sourcecode files
before building.

These should all improve DX while working on Tvix.

## Store Composition
Another big missing feature that landed was Store Composition. We briefly spoke
about the Tvix Store Model in the last update, but we didn't go into too much
detail on how that'd work in case there's multiple potential sources for a store
path or some more granular contents (which is pretty much always the case
normally, think about using things from your local store OR then falling back to
a remote place).

Nix has the default model of using `/nix/store` with a sqlite database for
metadata as a local store, and one or multiple "subsituters" using the Nix HTTP
Binary Cache protocol.

In Tvix, things need to be a bit more flexible:
 - You might be in a setting where you don't have a local `/nix/store` at all.
 - You might want to have a view of different substituters/binary caches for
   different users.
 - You might want to explicitly specify caches in between some of these layers,
   and control their config.

The idea in Tvix is that you'll be able to combine "hierarchies of stores" through
runtime configuration to express all this.

It's currently behind a `xp-store-composition` feature flag, which adds the
optional `--experimental-store-composition` CLI arg, pointing to a TOML file
specifying the composition configuration. If set, this has priority over the old
CLI args for the three (single) stores.

We're still not 100% sure how to best expose this functionality, in terms of the
appropriate level of granularity, in a user-friendly format.

There's also some more combinators and refactors missing, but please let us
know your thoughts!

## Contributors
There's been a lot of progress, which would not have been possible without our
contributors! Be it a small drive-by contributions, or large efforts, thank
you all!

 - Adam Joseph
 - Alice Carroll
 - Aspen Smith
 - Ben Webb
 - binarycat
 - Brian Olsen
 - Connor Brewster
 - Daniel Mendler
 - edef
 - Edwin Mackenzie-Owen
 - espes
 - Farid Zakaria
 - Florian Klink
 - Ilan Joselevich
 - Luke Granger-Brown
 - Markus Rudy
 - Matthew Tromp
 - Moritz Sanft
 - Padraic-O-Mhuiris
 - Peter Kolloch
 - Picnoir
 - Profpatsch
 - Ryan Lahfa
 - Simon Hauser
 - sinavir
 - sterni
 - Steven Allen
 - tcmal
 - toastal
 - Vincent Ambo
 - Yureka

---

That's it again, try out Tvix and hit us up on IRC or on our mailing list if you
run into any snags, or have any questions.


[^1]: Essentially, deploying a collecting agent on your machines, accepting
      these traces.
[^2]: Using the `traceparent` header field from https://www.w3.org/TR/trace-context/#trace-context-http-headers-format
[^3]: like `builtins.toFile` not adding files yet, or `inputSources` being missed initially, duh!)

[2024-02-tvix-update]:        https://tvl.fyi/blog/tvix-update-february-24
[opentelemetry]:              https://opentelemetry.io/
[otlp-sdk]:                   https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/
[tracing-rs]:                 https://tracing.rs/
[tracy]:                      https://github.com/wolfpld/tracy
[asciinema-import]:           https://asciinema.org/a/Fs4gKTFFpPGYVSna0xjTPGaNp
[blog-rewriting-nix]:         https://tvl.fyi/blog/rewriting-nix
[crate2nix]:                  https://github.com/nix-community/crate2nix
[redb]:                       https://github.com/cberner/redb
[bigtable]:                   https://cloud.google.com/bigtable
[tvixbolt]:                   https://bolt.tvix.dev/