about summary refs log tree commit diff
path: root/tvix/castore/docs/data-model.md
diff options
context:
space:
mode:
authorFlorian Klink <flokli@flokli.de>2023-10-19T14·01+0100
committerclbot <clbot@tvl.fyi>2023-11-02T09·08+0000
commitbeae3a4bf163c34b90530ab4601ce4dc753ed9e4 (patch)
tree5da3dab268185ef80f17a36d8073ddcda02bd778 /tvix/castore/docs/data-model.md
parentd545f11819a34637ce016c31a0fc5ca17af0c475 (diff)
chore(tvix/castore): move data model docs to here r/6920
These describe the castore data model, so it should live in the castore
crate.
Also, some minor edits to //tvix/store/docs/api.md, to honor the move of
the castore bits to tvix-castore.

Change-Id: I1836556b652ac0592336eac95a8d0647599f4aec
Reviewed-on: https://cl.tvl.fyi/c/depot/+/9893
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: tazjin <tazjin@tvl.su>
Tested-by: BuildkiteCI
Diffstat (limited to 'tvix/castore/docs/data-model.md')
-rw-r--r--tvix/castore/docs/data-model.md50
1 files changed, 50 insertions, 0 deletions
diff --git a/tvix/castore/docs/data-model.md b/tvix/castore/docs/data-model.md
new file mode 100644
index 000000000000..2df6761aae8f
--- /dev/null
+++ b/tvix/castore/docs/data-model.md
@@ -0,0 +1,50 @@
+# Data model
+
+This provides some more notes on the fields used in castore.proto.
+
+See `//tvix/store/docs/api.md` for the full context.
+
+## Directory message
+`Directory` messages use the blake3 hash of their canonical protobuf
+serialization as its identifier.
+
+A `Directory` message contains three lists, `directories`, `files` and
+`symlinks`, holding `DirectoryNode`, `FileNode` and `SymlinkNode` messages
+respectively. They describe all the direct child elements that are contained in
+a directory.
+
+All three message types have a `name` field, specifying the (base)name of the
+element (which MUST not contain slashes or null bytes, and MUST not be '.' or '..').
+For reproducibility reasons, the lists MUST be sorted by that name and also
+MUST be unique across all three lists.
+
+In addition to the `name` field, the various *Node messages have the following
+fields:
+
+## DirectoryNode
+A `DirectoryNode` message represents a child directory.
+
+It has a `digest` field, which points to the identifier of another `Directory`
+message, making a `Directory` a merkle tree (or strictly speaking, a graph, as
+two elements pointing to a child directory with the same contents would point
+to the same `Directory` message.
+
+There's also a `size` field, containing the (total) number of all child
+elements in the referenced `Directory`, which helps for inode calculation.
+
+## FileNode
+A `FileNode` message represents a child (regular) file.
+
+Its `digest` field contains the blake3 hash of the file contents. It can be
+looked up in the `BlobService`.
+
+The `size` field contains the size of the blob the `digest` field refers to.
+
+The `executable` field specifies whether the file should be marked as
+executable or not.
+
+## SymlinkNode
+A `SymlinkNode` message represents a child symlink.
+
+In addition to the `name` field, the only additional field is the `target`,
+which is a string containing the target of the symlink.