From beae3a4bf163c34b90530ab4601ce4dc753ed9e4 Mon Sep 17 00:00:00 2001 From: Florian Klink Date: Thu, 19 Oct 2023 15:01:53 +0100 Subject: chore(tvix/castore): move data model docs to here These describe the castore data model, so it should live in the castore crate. Also, some minor edits to //tvix/store/docs/api.md, to honor the move of the castore bits to tvix-castore. Change-Id: I1836556b652ac0592336eac95a8d0647599f4aec Reviewed-on: https://cl.tvl.fyi/c/depot/+/9893 Autosubmit: flokli Reviewed-by: tazjin Tested-by: BuildkiteCI --- tvix/castore/docs/data-model.md | 50 +++++++++++++++++++++++++++++ tvix/castore/docs/why-not-git-trees.md | 57 ++++++++++++++++++++++++++++++++++ tvix/store/docs/api.md | 31 ++++++++++-------- tvix/store/docs/castore.md | 50 ----------------------------- tvix/store/docs/why-not-git-trees.md | 57 ---------------------------------- 5 files changed, 125 insertions(+), 120 deletions(-) create mode 100644 tvix/castore/docs/data-model.md create mode 100644 tvix/castore/docs/why-not-git-trees.md delete mode 100644 tvix/store/docs/castore.md delete mode 100644 tvix/store/docs/why-not-git-trees.md diff --git a/tvix/castore/docs/data-model.md b/tvix/castore/docs/data-model.md new file mode 100644 index 000000000000..2df6761aae8f --- /dev/null +++ b/tvix/castore/docs/data-model.md @@ -0,0 +1,50 @@ +# Data model + +This provides some more notes on the fields used in castore.proto. + +See `//tvix/store/docs/api.md` for the full context. + +## Directory message +`Directory` messages use the blake3 hash of their canonical protobuf +serialization as its identifier. + +A `Directory` message contains three lists, `directories`, `files` and +`symlinks`, holding `DirectoryNode`, `FileNode` and `SymlinkNode` messages +respectively. They describe all the direct child elements that are contained in +a directory. + +All three message types have a `name` field, specifying the (base)name of the +element (which MUST not contain slashes or null bytes, and MUST not be '.' or '..'). +For reproducibility reasons, the lists MUST be sorted by that name and also +MUST be unique across all three lists. + +In addition to the `name` field, the various *Node messages have the following +fields: + +## DirectoryNode +A `DirectoryNode` message represents a child directory. + +It has a `digest` field, which points to the identifier of another `Directory` +message, making a `Directory` a merkle tree (or strictly speaking, a graph, as +two elements pointing to a child directory with the same contents would point +to the same `Directory` message. + +There's also a `size` field, containing the (total) number of all child +elements in the referenced `Directory`, which helps for inode calculation. + +## FileNode +A `FileNode` message represents a child (regular) file. + +Its `digest` field contains the blake3 hash of the file contents. It can be +looked up in the `BlobService`. + +The `size` field contains the size of the blob the `digest` field refers to. + +The `executable` field specifies whether the file should be marked as +executable or not. + +## SymlinkNode +A `SymlinkNode` message represents a child symlink. + +In addition to the `name` field, the only additional field is the `target`, +which is a string containing the target of the symlink. diff --git a/tvix/castore/docs/why-not-git-trees.md b/tvix/castore/docs/why-not-git-trees.md new file mode 100644 index 000000000000..fd46252cf55c --- /dev/null +++ b/tvix/castore/docs/why-not-git-trees.md @@ -0,0 +1,57 @@ +## Why not git tree objects? + +We've been experimenting with (some variations of) the git tree and object +format, and ultimately decided against using it as an internal format, and +instead adapted the one documented in the other documents here. + +While the tvix-store API protocol shares some similarities with the format used +in git for trees and objects, the git one has shown some significant +disadvantages: + +### The binary encoding itself + +#### trees +The git tree object format is a very binary, error-prone and +"made-to-be-read-and-written-from-C" format. + +Tree objects are a combination of null-terminated strings, and fields of known +length. References to other tree objects use the literal sha1 hash of another +tree object in this encoding. +Extensions of the format/changes are very hard to do right, because parsers are +not aware they might be parsing something different. + +The tvix-store protocol uses a canonical protobuf serialization, and uses +the [blake3][blake3] hash of that serialization to point to other `Directory` +messages. +It's both compact and with a wide range of libraries for encoders and decoders +in many programming languages. +The choice of protobuf makes it easy to add new fields, and make old clients +aware of some unknown fields being detected [^adding-fields]. + +#### blob +On disk, git blob objects start with a "blob" prefix, then the size of the +payload, and then the data itself. The hash of a blob is the literal sha1sum +over all of this - which makes it something very git specific to request for. + +tvix-store simply uses the [blake3][blake3] hash of the literal contents +when referring to a file/blob, which makes it very easy to ask other data +sources for the same data, as no git-specific payload is included in the hash. +This also plays very well together with things like [iroh][iroh-discussion], +which plans to provide a way to substitute (large)blobs by their blake3 hash +over the IPFS network. + +In addition to that, [blake3][blake3] makes it possible to do +[verified streaming][bao], as already described in other parts of the +documentation. + +The git tree object format uses sha1 both for references to other trees and +hashes of blobs, which isn't really a hash function to fundamentally base +everything on in 2023. +The [migration to sha256][git-sha256] also has been dead for some years now, +and it's unclear how a "blake3" version of this would even look like. + +[bao]: https://github.com/oconnor663/bao +[blake3]: https://github.com/BLAKE3-team/BLAKE3 +[git-sha256]: https://git-scm.com/docs/hash-function-transition/ +[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197 +[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect. \ No newline at end of file diff --git a/tvix/store/docs/api.md b/tvix/store/docs/api.md index 6a4b98911c2c..c1dacc89a598 100644 --- a/tvix/store/docs/api.md +++ b/tvix/store/docs/api.md @@ -1,10 +1,12 @@ -tvix-store API +tvix-[ca]store API ============== -This document outlines the design of the API exposed by tvix-store, as -well as other implementations of this store protocol. +This document outlines the design of the API exposed by tvix-castore and tvix- +store, as well as other implementations of this store protocol. -This document is meant to be read side-by-side with [castore.md](./castore.md) which describes the data model in more detail. +This document is meant to be read side-by-side with +[castore.md](../../tvix-castore/docs/castore.md) which describes the data model +in more detail. The store API has four main consumers: @@ -115,8 +117,9 @@ content-addressed world to a physical path. ### PathInfo As most paths in the Nix store currently are input-addressed [^input-addressed], -we need something mapping from an input-addressed "output path hash" to the -contents in the content- addressed world. +and the `tvix-castore` data model is also not intrinsically using NAR hashes, +we need something mapping from an input-addressed "output path hash" (or a Nix- +specific content-addressed path) to the contents in the `tvix-castore` world. That's what `PathInfo` provides. It embeds the root node (Directory, File or Symlink) at a given store path. @@ -215,13 +218,15 @@ This is useful for people running a Tvix-only system, or running builds on a In a system with Nix installed, we can't simply manually "extract" things to `/nix/store`, as Nix assumes to own all writes to this location. In these use cases, we're probably better off exposing a tvix-store as a local -binary cache (that's what nar-bridge does). +binary cache (that's what `//tvix/nar-bridge` does). Assuming we are in an environment where we control `/nix/store` exclusively, a -"realize to disk" would either "extract" things from the tvix-store to a -filesystem, or expose a FUSE filesystem. The latter would be particularly -interesting for remote build workloads, as build inputs can be realized on- -demand, which saves copying around a lot of never-accessed files. +"realize to disk" would either "extract" things from the `tvix-store` to a +filesystem, or expose a `FUSE`/`virtio-fs` filesystem. + +The latter is already implemented, and particularly interesting for (remote) +build workloads, as build inputs can be realized on-demand, which saves copying +around a lot of never- accessed files. In both cases, the API interactions are similar. * The *PathInfoService* is asked for the `PathInfo` of the requested store path. @@ -253,7 +258,7 @@ As already described above, the only non-content-addressed service is the This means, all other messages (such as `Blob` and `Directory` messages) can be substituted from many different, untrusted sources/mirrors, which will make plugging in additional substitution strategies like IPFS, local network -neighbors super simple. +neighbors super simple. That's also why it's living in the `tvix-castore` crate. As for `PathInfo`, we don't specify an additional signature mechanism yet, but carry the NAR-based signatures from Nix along. @@ -268,7 +273,7 @@ rather than a whole NAR file. A future signature mechanism, that is only signing (parts of) the `PathInfo` message, which only points to content-addressed data will enable verified partial access into a store path, opening up opportunities for lazy filesystem -access, which is very useful in remote builder scenarios. +access etc. diff --git a/tvix/store/docs/castore.md b/tvix/store/docs/castore.md deleted file mode 100644 index f555ba5a861b..000000000000 --- a/tvix/store/docs/castore.md +++ /dev/null @@ -1,50 +0,0 @@ -# //tvix/store/docs/castore.md - -This provides some more notes on the fields used in castore.proto. - -It's meant to supplement `//tvix/store/docs/api.md`. - -## Directory message -`Directory` messages use the blake3 hash of their canonical protobuf -serialization as its identifier. - -A `Directory` message contains three lists, `directories`, `files` and -`symlinks`, holding `DirectoryNode`, `FileNode` and `SymlinkNode` messages -respectively. They describe all the direct child elements that are contained in -a directory. - -All three message types have a `name` field, specifying the (base)name of the -element (which MUST not contain slashes or null bytes, and MUST not be '.' or '..'). -For reproducibility reasons, the lists MUST be sorted by that name and also -MUST be unique across all three lists. - -In addition to the `name` field, the various *Node messages have the following -fields: - -## DirectoryNode -A `DirectoryNode` message represents a child directory. - -It has a `digest` field, which points to the identifier of another `Directory` -message, making a `Directory` a merkle tree (or strictly speaking, a graph, as -two elements pointing to a child directory with the same contents would point -to the same `Directory` message. - -There's also a `size` field, containing the (total) number of all child -elements in the referenced `Directory`, which helps for inode calculation. - -## FileNode -A `FileNode` message represents a child (regular) file. - -Its `digest` field contains the blake3 hash of the file contents. It can be -looked up in the `BlobService`. - -The `size` field contains the size of the blob the `digest` field refers to. - -The `executable` field specifies whether the file should be marked as -executable or not. - -## SymlinkNode -A `SymlinkNode` message represents a child symlink. - -In addition to the `name` field, the only additional field is the `target`, -which is a string containing the target of the symlink. diff --git a/tvix/store/docs/why-not-git-trees.md b/tvix/store/docs/why-not-git-trees.md deleted file mode 100644 index fd46252cf55c..000000000000 --- a/tvix/store/docs/why-not-git-trees.md +++ /dev/null @@ -1,57 +0,0 @@ -## Why not git tree objects? - -We've been experimenting with (some variations of) the git tree and object -format, and ultimately decided against using it as an internal format, and -instead adapted the one documented in the other documents here. - -While the tvix-store API protocol shares some similarities with the format used -in git for trees and objects, the git one has shown some significant -disadvantages: - -### The binary encoding itself - -#### trees -The git tree object format is a very binary, error-prone and -"made-to-be-read-and-written-from-C" format. - -Tree objects are a combination of null-terminated strings, and fields of known -length. References to other tree objects use the literal sha1 hash of another -tree object in this encoding. -Extensions of the format/changes are very hard to do right, because parsers are -not aware they might be parsing something different. - -The tvix-store protocol uses a canonical protobuf serialization, and uses -the [blake3][blake3] hash of that serialization to point to other `Directory` -messages. -It's both compact and with a wide range of libraries for encoders and decoders -in many programming languages. -The choice of protobuf makes it easy to add new fields, and make old clients -aware of some unknown fields being detected [^adding-fields]. - -#### blob -On disk, git blob objects start with a "blob" prefix, then the size of the -payload, and then the data itself. The hash of a blob is the literal sha1sum -over all of this - which makes it something very git specific to request for. - -tvix-store simply uses the [blake3][blake3] hash of the literal contents -when referring to a file/blob, which makes it very easy to ask other data -sources for the same data, as no git-specific payload is included in the hash. -This also plays very well together with things like [iroh][iroh-discussion], -which plans to provide a way to substitute (large)blobs by their blake3 hash -over the IPFS network. - -In addition to that, [blake3][blake3] makes it possible to do -[verified streaming][bao], as already described in other parts of the -documentation. - -The git tree object format uses sha1 both for references to other trees and -hashes of blobs, which isn't really a hash function to fundamentally base -everything on in 2023. -The [migration to sha256][git-sha256] also has been dead for some years now, -and it's unclear how a "blake3" version of this would even look like. - -[bao]: https://github.com/oconnor663/bao -[blake3]: https://github.com/BLAKE3-team/BLAKE3 -[git-sha256]: https://git-scm.com/docs/hash-function-transition/ -[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197 -[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect. \ No newline at end of file -- cgit 1.4.1