about summary refs log tree commit diff
path: root/tvix/store/docs/why-not-git-trees.md
diff options
context:
space:
mode:
authorFlorian Klink <flokli@flokli.de>2023-10-19T14·01+0100
committerclbot <clbot@tvl.fyi>2023-11-02T09·08+0000
commitbeae3a4bf163c34b90530ab4601ce4dc753ed9e4 (patch)
tree5da3dab268185ef80f17a36d8073ddcda02bd778 /tvix/store/docs/why-not-git-trees.md
parentd545f11819a34637ce016c31a0fc5ca17af0c475 (diff)
chore(tvix/castore): move data model docs to here r/6920
These describe the castore data model, so it should live in the castore
crate.
Also, some minor edits to //tvix/store/docs/api.md, to honor the move of
the castore bits to tvix-castore.

Change-Id: I1836556b652ac0592336eac95a8d0647599f4aec
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9893
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Diffstat (limited to 'tvix/store/docs/why-not-git-trees.md')
-rw-r--r--tvix/store/docs/why-not-git-trees.md57
1 files changed, 0 insertions, 57 deletions
diff --git a/tvix/store/docs/why-not-git-trees.md b/tvix/store/docs/why-not-git-trees.md
deleted file mode 100644
index fd46252cf55c..000000000000
--- a/tvix/store/docs/why-not-git-trees.md
+++ /dev/null
@@ -1,57 +0,0 @@
-## Why not git tree objects?
-
-We've been experimenting with (some variations of) the git tree and object
-format, and ultimately decided against using it as an internal format, and
-instead adapted the one documented in the other documents here.
-
-While the tvix-store API protocol shares some similarities with the format used
-in git for trees and objects, the git one has shown some significant
-disadvantages:
-
-### The binary encoding itself
-
-#### trees
-The git tree object format is a very binary, error-prone and
-"made-to-be-read-and-written-from-C" format.
-
-Tree objects are a combination of null-terminated strings, and fields of known
-length. References to other tree objects use the literal sha1 hash of another
-tree object in this encoding.
-Extensions of the format/changes are very hard to do right, because parsers are
-not aware they might be parsing something different.
-
-The tvix-store protocol uses a canonical protobuf serialization, and uses
-the [blake3][blake3] hash of that serialization to point to other `Directory`
-messages.
-It's both compact and with a wide range of libraries for encoders and decoders
-in many programming languages.
-The choice of protobuf makes it easy to add new fields, and make old clients
-aware of some unknown fields being detected [^adding-fields].
-
-#### blob
-On disk, git blob objects start with a "blob" prefix, then the size of the
-payload, and then the data itself. The hash of a blob is the literal sha1sum
-over all of this - which makes it something very git specific to request for.
-
-tvix-store simply uses the [blake3][blake3] hash of the literal contents
-when referring to a file/blob, which makes it very easy to ask other data
-sources for the same data, as no git-specific payload is included in the hash.
-This also plays very well together with things like [iroh][iroh-discussion],
-which plans to provide a way to substitute (large)blobs by their blake3 hash
-over the IPFS network.
-
-In addition to that, [blake3][blake3] makes it possible to do
-[verified streaming][bao], as already described in other parts of the
-documentation.
-
-The git tree object format uses sha1 both for references to other trees and
-hashes of blobs, which isn't really a hash function to fundamentally base
-everything on in 2023.
-The [migration to sha256][git-sha256] also has been dead for some years now,
-and it's unclear how a "blake3" version of this would even look like.
-
-[bao]: https://github.com/oconnor663/bao
-[blake3]: https://github.com/BLAKE3-team/BLAKE3
-[git-sha256]: https://git-scm.com/docs/hash-function-transition/
-[iroh-discussion]: https://github.com/n0-computer/iroh/discussions/707#discussioncomment-5070197
-[^adding-fields]: Obviously, adding new fields will change hashes, but it's something that's easy to detect.
\ No newline at end of file