1 files changed, 105 insertions, 17 deletions
diff --git a/tvix/docs/src/TODO.md b/tvix/docs/src/TODO.md
index 8fb22ea822..127fb6f4d0 100644
--- a/tvix/docs/src/TODO.md
+++ b/tvix/docs/src/TODO.md
@@ -25,17 +25,69 @@ sure noone is working on this, or has some specific design in mind already.
    with a different level of `--strict`, but the toplevel doc-comment suggests
    its generic?
 
+### crate2nix for WASM (@kranzes)
+Most of Tvix is living inside a `//tvix` cargo workspace, and we use `crate2nix`
+as a build system, to get crate-level build granularity (and caching), keeping
+compile times somewhat manageable.
+
+In the future, for Store/Build, we want to build some more web frontends,
+exposing some data by calling to the API. Being able to write this in Rust,
+and reusing most of our existing code dealing with the data structures would
+be preferred.
+
+However, using the crate2nix tooling in combination with compiling for WASM is
+a bumpy ride (and `//web.tvixbolt` works around this by using
+`rustPlatform.buildRustPackage` instead, which invokes cargo inside a FOD):
+
+`buildRustCrate` in nixpkgs (which is used by `crate2nix` under the hood)
+doesn't allow specifying another `--target` explicitly, but relies on the cross
+machinery in nixpkgs exclusively.
+
+`doc/languages-frameworks/rust.section.md` suggests it should be a matter of
+re-instantiating nixpkgs for `wasm32-unknown-unknown`, but that's no recognized
+as a valid architecture.
+The suggested alternative, setting only `rustc.config` to it seems to get us
+further, but the `Crate.nix` logic for detecting arch-conditional crates doesn't
+seem to cover that case, and tries to build crates (`cpufeatures` for `sha{1,2}`)
+which are supposed to be skipped.
+
+## Perf
+ - String Contexts currently do a lot of indirections (edef)
+   (NixString -> NixStringInner -> HashSet[element] -> NixContextElement -> String -> data)
+   to get to the actual data. We should improve this. There's various ideas, one
+   of it is globally interning all Nix context elements, and only keeping
+   indices into that. We might need to have different representations for small
+   amount of context elements or larger ones, and need tooling to reason about
+   the amount of contexts we have.
+ - To calculate NAR size and digest (used for output path calculation of FODs),
+   our current `SimpleRenderer` `NarCalculationService` sequentially asks for
+   one blob after another (and internally these might consists out of multiple
+   chunks too).
+   That's a lot of roundtrips, adding up to a lot of useless waiting.
+   While we cannot avoid having to feed all bytes sequentially through sha256,
+   we already know what blobs to fetch and in which order.
+   There should be a way to buffer some "amount of upcoming bytes" in memory,
+   and not requesting these seqentially.
+   This is somewhat the "spiritual counterpart" to our sequential ingestion
+   code (`ConcurrentBlobUploader`, used by `ingest_nar`), which keeps
+   "some amount of outgoing bytes" in memory.
+
 ### Error cleanup
  - Currently, all services use tvix_castore::Error, which only has two kinds
    (invalid request, storage error), containing an (owned) string.
    This is quite primitive. We should have individual error types for BS, DS, PS.
    Maybe these should have some generics to still be able to carry errors from
    the underlying backend, similar to `IngestionError`.
+   There was an attempt to give PS separate error types (cl/11695), but this
+   ended up very verbose.
+   Every error had to be boxed, and a possible additional message be added. Some
+   errors that didn't wrap another underlying errors were hard to construct, too
+   (requiring the addition of errors). All of this without even having added
+   proper backtrace support, which would be quite helpful in store hierarchies.
+   `anyhow`'s `.context()` gives us most of this out of the box. Maybe we can
+   use that, using enums rather than `&'static str` as context in some cases?
 
 ## Fixes towards correctness
- - `builtins.toXML` is missing string context. See b/398.
- - `builtins.toXML` self-closing tags need to be configurable in a more granular
-   fashion, requires third-party crate support. See b/399.
  - `rnix` only supports string source files, but `NixString` uses bytes (and Nix
    source code might be no valid UTF-8).
 
@@ -75,10 +127,6 @@ Some more fetcher-related builtins need work:
  - `fetchTree` (hairy, seems there's no proper spec and the URL syntax seems
    subject to change/underdocumented)
 
-### Convert builtins:fetchurl to Fetches
-We need to convert `builtins:fetchurl`-style calls to `builtins.derivation` to
-fetches, not Derivations (tracked in `KnownPaths`).
-
 ### Derivation -> Build
 While we have some support for `structuredAttrs` and `fetchClosure` (at least
 enough to calculate output hashes, aka produce identical ATerm), the code
@@ -101,9 +149,33 @@ logs etc, but this is something requiring a lot of designing.
 
 ### Store composition
  - Combinators: list-by-priority, first-come-first-serve, cache
- - How do describe hierarchies. URL format too one-dimensional, but we might get
-   quite far with a similar "substituters" concept that Nix uses, to construct
-   the composed stores.
+ - Store composition hierarchies (@yuka).
+   - URL format too one-dimensional.
+   - We want to have nice and simple user-facing substituter config, including
+     sensible default wrappers for caching, retries, fallbacks, as well as
+     granular control for power-users.
+   - Current design idea:
+     - Have a concept similar to rclone config (map with store aliases as
+       keys, allowing to refer to stores by their alias from other parts of
+       the config).
+       It allows both referring to by name, as well as ad-hoc definition:
+       https://rclone.org/docs/#syntax-of-remote-paths
+     - Each store needs to be aware of its "instance name", so it can be
+       included in logs, metrics, …
+     - Have a "instantiation function" traversing such a config data structure,
+       creating store instances and plugging them together, ultimately returning
+       a dyn …Service interface.
+     - No reconfiguration/reconcilation for now
+     - Making URLs the primary data format would get ugly quite easily (hello
+       multiple layers of escaping!), so best to convert the existing URL
+       syntax to our new config format on the fly and then use one codepath
+       to instantiate/assemble. Similarly, something like the "user-facing
+       substituter config" mentioned above could aalso be converted to such a
+       config format under the hood.
+     - Maybe add a ?cache=$other_url parameter support to the URL syntax, to
+       easily wrap a store with a caching frontend, using $other_url as the
+      "near" store URL.
+
 ### Store Config
    There's already serde for some store options (bigtable uses `serde_qs`).
    We might also have common options global over all backends, like chunking
@@ -114,7 +186,22 @@ logs etc, but this is something requiring a lot of designing.
 ### BlobService
  - On the trait side, currently there's no way to distinguish reading a
    known-chunk vs blob, so we might be calling `.chunks()` unnecessarily often.
-   At least for the `object_store` backend, this might be a problem.
+   At least for the `object_store` backend, this might be a problem, causing a
+   lot of round-trips. It also doesn't compose well - every implementation of
+   `BlobService` needs to both solve the "holding metadata about chunking info"
+   as well as "storing chunks" questions.
+   Design idea (@flokli): split these two concerns into two separate traits:
+    - a `ChunkService` dealing with retrieving individual chunks, by their
+      content digests. Chunks are small enough to keep around in contiguous
+      memory.
+    - a `BlobService` storing metadata about blobs.
+
+   Individual stores would not need to implement `BlobReader` anymore, but that
+   could be a global thing with access to the whole store composition layer,
+   which should make it easier to reuse chunks from other backends. Unclear
+   if the write path should be structured the same way. At least for some
+   backends, we want the remote end to be able to decide about chunking.
+
  - While `object_store` recently got support for `Content-Type`
    (https://github.com/apache/arrow-rs/pull/5650), there's no support on the
    local filesystem yet. We'd need to add support to this (through xattrs).
@@ -134,9 +221,10 @@ logs etc, but this is something requiring a lot of designing.
 - Some work ongoing on the worker operation parsing (griff, picnoir)
 
 ### O11Y
- - gRPC trace propagation (cl/10532)
- - `tracing-tracy` (cl/10952)
- - `[tracing-]indicatif` for progress/log reporting (floklis stash)
- - unification into `tvix-tracing` crate, currently a lot of boilerplate
-   in `tvix-store` CLI entrypoint, and half of the boilerplate copied over to
-   `tvix-cli`.
+ - Maybe drop `--log-level` entirely, and only use `RUST_LOG` env exclusively?
+   `debug`,`trace` level across all crates is a bit useless, and `RUST_LOG` can
+   be much more granular…
+ - Trace propagation for HTTP clients too, using
+   https://www.w3.org/TR/trace-context/ or https://www.w3.org/TR/baggage/,
+   whichever makes more sense.
+   Candidates: nix+http(s) protocol, object_store crates.