diff options
Diffstat (limited to 'tvix/castore/docs/blobstore-protocol.md')
-rw-r--r-- | tvix/castore/docs/blobstore-protocol.md | 104 |
1 files changed, 0 insertions, 104 deletions
diff --git a/tvix/castore/docs/blobstore-protocol.md b/tvix/castore/docs/blobstore-protocol.md deleted file mode 100644 index 048cafc3d877..000000000000 --- a/tvix/castore/docs/blobstore-protocol.md +++ /dev/null @@ -1,104 +0,0 @@ -# BlobStore: Protocol / Composition - -This documents describes the protocol that BlobStore uses to substitute blobs -other ("remote") BlobStores. - -How to come up with the blake3 digest of the blob to fetch is left to another -layer in the stack. - -To put this into the context of Tvix as a Nix alternative, a blob represents an -individual file inside a StorePath. -In the Tvix Data Model, this is accomplished by having a `FileNode` (either the -`root_node` in a `PathInfo` message, or a individual file inside a `Directory` -message) encode a BLAKE3 digest. - -However, the whole infrastructure can be applied for other usecases requiring -exchange/storage or access into data of which the blake3 digest is known. - -## Protocol and Interfaces -As an RPC protocol, BlobStore currently uses gRPC. - -On the Rust side of things, every blob service implements the -[`BlobService`](../src/blobservice/mod.rs) async trait, which isn't -gRPC-specific. - -This `BlobService` trait provides functionality to check for existence of Blobs, -read from blobs, and write new blobs. -It also provides a method to ask for more granular chunks if they are available. - -In addition to some in-memory, on-disk and (soon) object-storage-based -implementations, we also have a `BlobService` implementation that talks to a -gRPC server, as well as a gRPC server wrapper component, which provides a gRPC -service for anything implementing the `BlobService` trait. - -This makes it very easy to talk to a remote `BlobService`, which does not even -need to be written in the same language, as long it speaks the same gRPC -protocol. - -It also puts very little requirements on someone implementing a new -`BlobService`, and how its internal storage or chunking algorithm looks like. - -The gRPC protocol is documented in `../protos/rpc_blobstore.proto`. -Contrary to the `BlobService` trait, it does not have any options for seeking/ -ranging, as it's more desirable to provide this through chunking (see also -`./blobstore-chunking.md`). - -## Composition -Different `BlobStore` are supposed to be "composed"/"layered" to express -caching, multiple local and remote sources. - -The fronting interface can be the same, it'd just be multiple "tiers" that can -respond to requests, depending on where the data resides. [^1] - -This makes it very simple for consumers, as they don't need to be aware of the -entire substitutor config. - -The flexibility of this doesn't need to be exposed to the user in the default -case; in most cases we should be fine with some form of on-disk storage and a -bunch of substituters with different priorities. - -### gRPC Clients -Clients are encouraged to always read blobs in a chunked fashion (asking for a -list of chunks for a blob via `BlobService.Stat()`, then fetching chunks via -`BlobService.Read()` as needed), instead of directly reading the entire blob via -`BlobService.Read()`. - -In a composition setting, this provides opportunity for caching, and avoids -downloading some chunks if they're already present locally (for example, because -they were already downloaded by reading from a similar blob earlier). - -It also removes the need for seeking to be a part of the gRPC protocol -alltogether, as chunks are supposed to be "reasonably small" [^2]. - -There's some further optimization potential, a `BlobService.Stat()` request -could tell the server it's happy with very small blobs just being inlined in -an additional additional field in the response, which would allow clients to -populate their local chunk store in a single roundtrip. - -## Verified Streaming -As already described in `./docs/blobstore-chunking.md`, the physical chunk -information sent in a `BlobService.Stat()` response is still sufficient to fetch -in an authenticated fashion. - -The exact protocol and formats are still a bit in flux, but here's some notes: - - - `BlobService.Stat()` request gets a `send_bao` field (bool), signalling a - [BAO][bao-spec] should be sent. Could also be `bao_shift` integer, signalling - how detailed (down to the leaf chunks) it should go. - The exact format (and request fields) still need to be defined, edef has some - ideas around omitting some intermediate hash nodes over the wire and - recomputing them, reducing size by another ~50% over [bao-tree]. - - `BlobService.Stat()` response gets some bao-related fields (`bao_shift` - field, signalling the actual format/shift level the server replies with, the - actual bao, and maybe some format specifier). - It would be nice to also be compatible with the baos used by [iroh], so we - can provide an implementation using it too. - ---- - -[^1]: We might want to have some backchannel, so it becomes possible to provide - feedback to the user that something is downloaded. -[^2]: Something between 512K-4M, TBD. -[bao-spec]: https://github.com/oconnor663/bao/blob/master/docs/spec.md -[bao-tree]: https://github.com/n0-computer/bao-tree -[iroh]: https://github.com/n0-computer/iroh |