docs(tvix/castore/blobstore): reorganize docs r/7479

docs/verified-streaming.md explained how CDC and verified streaming can work together, but didn't really highlight enough how chunking in general also helps with seeking. In addition, a lot of the thoughts w.r.t. the BlobStore protocol, both gRPC and Rust traits, as well as why there's no support for seeking directly in gRPC, as well as how clients should behave w.r.t. chunked fetching was missing, or mixed together with the verified streaming bits. While there is no verified streaming version yet, a chunked one is coming soon, and documenting this a bit better is gonna make it easier to understand, as well as provide some lookout on where this is heading. Change-Id: Ib11b8ccf2ef82f9f3a43b36103df0ad64a9b68ce Reviewed-on: https://cl.tvl.fyi/c/depot/+/10733 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
author: Florian Klink <flokli@flokli.de> 2024-02-02T13·51+0200
committer: flokli <flokli@flokli.de> 2024-02-06T18·28+0000
commit: 40d81d0c74bdad3d78a28a0fe63322ef9820d866 (patch)
tree: 601f6812c002f0d51e91d7612b7e216c07a10076 /tvix/castore/docs/verified-streaming.md
parent: 35e7b8a1a89c3f5c2670dabc603518bcbf82202c (diff)
1 files changed, 0 insertions, 121 deletions
diff --git a/tvix/castore/docs/verified-streaming.md b/tvix/castore/docs/verified-streaming.md
deleted file mode 100644
index fc766fff5d..0000000000
--- a/tvix/castore/docs/verified-streaming.md
+++ /dev/null
@@ -1,121 +0,0 @@
-# Verified streaming
-
-`//tvix/castore` is a content-addressed storage system, using [blake3] as hash
-function.
-
-This means returned data is fetched by using the digest as lookup key, and can
-be verified to be correct by feeding the received data through the hash function
-and ensuring it matches the digest initially used for the lookup.
-
-This means, data can be downloaded by any untrusted third-party as well, as the
-received data is validated to match the digest it was originally requested with.
-
-However, for larger blobs of data, having to download the entire blob to be able
-to determine whether it's correct before being able to return it to an upper
-layer takes a lot of time, and is wasteful, if we're only interested in a small
-portion of it.
-
-Especially when substituting from an untrusted third-party, we want to be able
-to detect quickly if that third-party is sending us wrong data, and terminate
-the connection early.
-
-## Chunking
-
-This problem has historically been solved by exchanging a list of smaller
-chunks, which can be fetched individually.
-
-BitTorrent for example breaks files up into smaller chunks, and maintains a list
-of sha1 digests for each of these chunks. After the list has been fetched, this
-allows fetching smaller parts of data selectively from untrusted third-parties.
-
-Similarly, IPFS uses its IPLD model to content-address a Merkle DAG of chunk
-nodes.
-
-While these approaches solve the problem of being able to fetch smaller chunks,
-they have a big disadvantage: the chunking parameters, and the topology of
-the graph structure itself "bleed" into the hash of the entire data structure
-itself.
-
-This comes with some disadvantages:
-
-Depending on the chunking parameters used, there's different representations for
-the same data, causing less data sharing/reuse in the overall content- addressed
-system, both when downloading data from third-parties, as well as benefiting
-from data already available locally.
-
-This can be workarounded by agreeing on only single way of chunking, but it's
-not pretty.
-
-## Chunking in tvix-castore
-
-tvix-castore uses BLAKE3 as a digest function, which internally uses a fixed
-chunksize of 1024 bytes.
-
-BLAKE3 is a tree hash where all left nodes fully populated, contrary to
-conventional serial hash functions. To be able to validate the hash of a node,
-one only needs the hash of the (2) children, if any.
-
-This means one only needs to the root digest to validate a construction, and
-lower levels of the tree can be omitted.
-
-This relieves us from the need of having to communicate more granular chunking
-upfront, and making it part of our data model.
-
-## Logical vs. physical chunking
-
-Due to the properties of the BLAKE3 hash function, we have logical blocks of
-1KiB, but this doesn't necessarily imply we need to restrict ourselves to these
-chunk sizes.
-
-The only thing we need to be able to read and verify an arbitrary byte range is
-having the covering range of aligned 1K blocks.
-
-## Actual implementation
-
- -> BlobService.Read() gets the capability to read chunks as well
- -> BlobService.Stat() can reply with a list of chunks.
-      rq params: send_bao bool
-         server should be able to offer bao all the way down to 1k
-         some open questions w.r.t sending the whole bao until there, or just
-         all the hashes on the "most granular" level
-         -> we can recreate everything above up to the root hash.
-         -> can we maybe add this to n0-computer/bao-tree as another outboard format?
-      resp:
-        - bao_shift: how many levels on the bottom were skipped.
-          0 means send all the leaf node hashes (1K block size)
-        - "our bao": blake3 digests for a given static chunk size + path down to the last leaf node and its data (length proof)
-        - list of (b3digest,size) of all physical chunks.
-          The server can do some CDC on ingestion, and communicate these chunks here.
-          Chunk sizes should be a "reasonable size", TBD, probably something between 512K-4M
-
-Depending on the bao depth received from the server, we end up with a logical
-size of chunks that can be fetched in an authenticated fashion.
-
-Assuming the bao chunk size received is 4(*1KiB bytes) (`bao_shift` of 2), and a
-total blob size of 14 (*1KiB bytes), we can fetch
-`[0..=3]`, `[4..=7]`, `[8..=11]` and `[12..=13]` in an authenticated fashion:
-
-`[ 0 1 2 3 ] [ 4 5 6 7 ] [ 8 9 10 11 ] [ 12 13 ]`
-
-Assuming the server now informs us about the following physical chunking:
-
-`[ 0 1 ] [ 2 3 4 5 ] [ 6 ] [ 7 8 ] [ 9 10 11 12 13 14 15 ]`
-
-To read from 0 until 4 (inclusive), we need to fetch physical chunks
-`[ 0 1 ]`, `[ 2 3 4 5 ]` and `[ 6 ] [ 7 8 ]`.
-
-`[ 0 1 ]` and `[ 2 3 4 5 ]` are obvious, they contain the data we're
-interested in.
-
-We however also need to fetch the physical chunks `[ 6 ]` and `[ 7 8 ]`, so we
-can assemble `[ 4 5 6 7 ]` to verify that logical chunk.
-
-Each physical chunk fetched can be validated to have the blake3 digest that was
-communicated upfront, and can be stored in a client-side cache/storage.
-
-If it's not there, the client can use the `BlobService.Read()` interface with
-the *physical chunk digest*.
-
----
-
-[blake3]: https://github.com/BLAKE3-team/BLAKE3
\ No newline at end of file
author	Florian Klink <flokli@flokli.de>	2024-02-02T13·51+0200
committer	flokli <flokli@flokli.de>	2024-02-06T18·28+0000
commit	40d81d0c74bdad3d78a28a0fe63322ef9820d866 (patch)
tree	601f6812c002f0d51e91d7612b7e216c07a10076 /tvix/castore/docs/verified-streaming.md
parent	35e7b8a1a89c3f5c2670dabc603518bcbf82202c (diff)