BlobStore: Protocol / Composition
This documents describes the protocol that BlobStore uses to substitute blobs other ("remote") BlobStores.
How to come up with the blake3 digest of the blob to fetch is left to another layer in the stack.
To put this into the context of Tvix as a Nix alternative, a blob represents an
individual file inside a StorePath.
In the Tvix Data Model, this is accomplished by having a FileNode
(either the
root_node
in a PathInfo
message, or a individual file inside a Directory
message) encode a BLAKE3 digest.
However, the whole infrastructure can be applied for other usecases requiring exchange/storage or access into data of which the blake3 digest is known.
Protocol and Interfaces
As an RPC protocol, BlobStore currently uses gRPC.
On the Rust side of things, every blob service implements the
BlobService
async trait, which isn't
gRPC-specific.
This BlobService
trait provides functionality to check for existence of Blobs,
read from blobs, and write new blobs.
It also provides a method to ask for more granular chunks if they are available.
In addition to some in-memory, on-disk and (soon) object-storage-based
implementations, we also have a BlobService
implementation that talks to a
gRPC server, as well as a gRPC server wrapper component, which provides a gRPC
service for anything implementing the BlobService
trait.
This makes it very easy to talk to a remote BlobService
, which does not even
need to be written in the same language, as long it speaks the same gRPC
protocol.
It also puts very little requirements on someone implementing a new
BlobService
, and how its internal storage or chunking algorithm looks like.
The gRPC protocol is documented in ../protos/rpc_blobstore.proto
.
Contrary to the BlobService
trait, it does not have any options for seeking/
ranging, as it's more desirable to provide this through chunking (see also
BlobStore Chunking.
Composition
Different BlobStore
are supposed to be "composed"/"layered" to express
caching, multiple local and remote sources.
The fronting interface can be the same, it'd just be multiple "tiers" that can respond to requests, depending on where the data resides. 1
This makes it very simple for consumers, as they don't need to be aware of the entire substitutor config.
The flexibility of this doesn't need to be exposed to the user in the default case; in most cases we should be fine with some form of on-disk storage and a bunch of substituters with different priorities.
gRPC Clients
Clients are encouraged to always read blobs in a chunked fashion (asking for a
list of chunks for a blob via BlobService.Stat()
, then fetching chunks via
BlobService.Read()
as needed), instead of directly reading the entire blob via
BlobService.Read()
.
In a composition setting, this provides opportunity for caching, and avoids downloading some chunks if they're already present locally (for example, because they were already downloaded by reading from a similar blob earlier).
It also removes the need for seeking to be a part of the gRPC protocol alltogether, as chunks are supposed to be "reasonably small" 2.
There's some further optimization potential, a BlobService.Stat()
request
could tell the server it's happy with very small blobs just being inlined in
an additional additional field in the response, which would allow clients to
populate their local chunk store in a single roundtrip.
Verified Streaming
As already described in BlobStore Chunking, the physical chunk
information sent in a BlobService.Stat()
response is still sufficient to fetch
in an authenticated fashion.
The exact protocol and formats are still a bit in flux, but here's some notes:
BlobService.Stat()
request gets asend_bao
field (bool), signalling a [BAO][bao-spec] should be sent. Could also bebao_shift
integer, signalling how detailed (down to the leaf chunks) it should go. The exact format (and request fields) still need to be defined, edef has some ideas around omitting some intermediate hash nodes over the wire and recomputing them, reducing size by another ~50% over [bao-tree].BlobService.Stat()
response gets some bao-related fields (bao_shift
field, signalling the actual format/shift level the server replies with, the actual bao, and maybe some format specifier). It would be nice to also be compatible with the baos used by [iroh], so we can provide an implementation using it too.
-
We might want to have some backchannel, so it becomes possible to provide feedback to the user that something is downloaded. ↩
-
Something between 512K-4M, TBD. [bao-spec]: https://github.com/oconnor663/bao/blob/master/docs/spec.md [bao-tree]: https://github.com/n0-computer/bao-tree [iroh]: https://github.com/n0-computer/iroh ↩