diff options
-rw-r--r-- | tvix/docs/src/TODO.md | 2 | ||||
-rw-r--r-- | tvix/docs/src/castore/blobstore-protocol.md | 2 | ||||
-rw-r--r-- | tvix/docs/src/castore/store-configuration.md | 173 |
3 files changed, 175 insertions, 2 deletions
diff --git a/tvix/docs/src/TODO.md b/tvix/docs/src/TODO.md index 57a685989a4c..b52d6616533a 100644 --- a/tvix/docs/src/TODO.md +++ b/tvix/docs/src/TODO.md @@ -120,8 +120,6 @@ Extend the other pages in here. Some ideas on what should be tackled: and trait-focused? - Restructure docs on castore vs store, this seems to be duplicated a bit and is probably still not too clear. - - Describe store composition(s) in more detail. There's some notes on granular - fetching which probably can be repurposed. - Absorb the rest of //tvix/website into this. ## Features diff --git a/tvix/docs/src/castore/blobstore-protocol.md b/tvix/docs/src/castore/blobstore-protocol.md index 0dff787ccb00..215a8316803d 100644 --- a/tvix/docs/src/castore/blobstore-protocol.md +++ b/tvix/docs/src/castore/blobstore-protocol.md @@ -57,6 +57,8 @@ The flexibility of this doesn't need to be exposed to the user in the default case; in most cases we should be fine with some form of on-disk storage and a bunch of substituters with different priorities. +Check [Store Configuration](./store-configuration.md) for more details. + ### gRPC Clients Clients are encouraged to always read blobs in a chunked fashion (asking for a list of chunks for a blob via `BlobService.Stat()`, then fetching chunks via diff --git a/tvix/docs/src/castore/store-configuration.md b/tvix/docs/src/castore/store-configuration.md new file mode 100644 index 000000000000..af476dd47922 --- /dev/null +++ b/tvix/docs/src/castore/store-configuration.md @@ -0,0 +1,173 @@ +# Store Configuration + +Currently, tvix-store (and tvix-cli) expose three different `--*-service-addr` +CLI args, describing how to talk to the three different stores. + +Depending on the CLI entrypoint, they have different defaults: + + - `tvix-cli` defaults to in-memory variants (`ServiceUrlsMemory`). + - `tvix-store daemon` defaults to using a local filesystem-based backend for + blobs, and redb backends for `DirectoryService` and `PathInfoService` + (`ServiceUrls`). + - other `tvix-store` entrypoints, as well as `nar-bridge` default to talking to + a `tvix-store` gRPC daemon (`ServiceUrlsGrpc`). + +The exact config and paths can be inspected by invoking `--help` on each of +these entrypoints, and it's of course possible to change this config, for +example in case everything should be done from a single binary, without a daemon +in between. +There currently is no caching on the client side wired up yet, and some (known) +unnecessary roundtrips (which can be removed after some refactoring), so for +everything except testing purposes you might want to directly connect to the +data stores, or use Store Composition to have caching, (and describe more +complicated fetch-through configs). + +## Store Composition +Internally, `tvix-castore` supports composing multiple instances of `BlobService`, +`DirectoryService` (and `PathInfoService`) together. + +It allows describing more complicated "hierarchies"/"tiers" of different +service types. It supports combining different storage backend/substituters/ +caches, and combining them in a DAG of some sort, ultimately exposing the same +(trait) interface as a single store. + +The three individual URLs exposed in the CLI currently are internally converted +to a composition with just one instance of each store (at the "root" name). + +Keep in mind the config format is very granular and low-level, and due to this, +a potential subject to larger breaking and unannounced changes, which is why we +it is not exposed by default yet. + +In the long term, for "user-facing" configuration, we might want to expose a +more opinionated middle ground between only a single instance and the super +granular store composition instead. + +For example, users could configure things like "a list of substituters" +and "caching args", and internally this could be transformed to a low-level +composition - potentially leaving this granular format for library/power users +only. + +### CLI usage +However, if you're okay with these caveats, and want to configure some caching +today, using the existing CLI entrypoints, you can enable the +`xp-composition-cli` feature flag in the `tvix-store` crate. + +With `cargo`, this can be enabled by passing +`--features tvix-store/xp-composition-cli` to a `cargo build` / `cargo run` +invocation. + +If enabled, CLI entrypoints get a `--experimental-store-composition` arg, which +accepts a TOML file describing a composition for all three stores (causing the +other `--*-service-addr` args to be ignored if set). + +It expects all BlobService instances to be inside a `blobservices` namespace/ +attribute, (`DirectoryService`s in `directoryservices`, and `PathInfoService`s +in `pathinfoservices` respectively), and requires one named "root". + +### Library usage +The store composition code can be accessed via `tvix_castore::composition`, and +`tvix_store::composition`. + +A global "registry" can be used to make other (out-of-tree) "types" of stores +known to the composition machinery. + +In terms of config format, you're also not required to use TOML, but anything +`serde` can deserialize. + +Make sure to check the module-level docstrings and code examples for +`tvix_castore::composition`. + +### Composition config format +Below examples are in the format accepted by the CLI, using the +`blobservices` / `directoryservices` / `pathinfoservices` namespace/attribute to +describe all three services. + +However, as expressed above, for library users this doesn't need to be TOML (but +anything serde can deserialize), and the composition hierarchy needs to be built +separately for each `{Blob,Directory,Pathinfo}Service`, dropping the namespaces +present in the TOML. + +#### Example: combined remote/local blobservice +This fetches blobs from a local store. If not found there, a remote store is +queried, and results are returned to the client and inserted into the local +store, to make subsequent lookups not query the remote again. + +```toml +[blobservices.root] +type = "combined" +near = "near" +far = "far" + +[blobservices.near] +type = "objectstore" +object_store_url = "file:///tmp/tvix/blobservice" +object_store_options = {} + +[blobservices.far] +type = "grpc" +url = "grpc+http://[::1]:8000" + +# […] directoryservices/pathinfoservices go here […] +``` + +### Example: LRU cache wrapping pathinfoservice +This keeps the last 1000 requested `PathInfo`s around in a local cache. +```toml +[pathinfoservices.root] +type = "cache" +near = "near" +far = "far" + +[pathinfoservices.near] +type = "lru" +capacity = 1000 + +[pathinfoservices.far] +type = "grpc" +url = "grpc+http://localhost:8000" + +# […] blobservices/directoryservices go here […] +``` + +### Example: Self-contained fetch-through tvix-store for `cache.nixos.org`. +This provides a `PathInfoService` "containing" `PathInfo` that are in +`cache.nixos.org`. + +To construct the `PathInfo` initially, we need to ingest the NAR to add missing +castore contents to `BlobService` / `DirectoryService` and return the resulting +root node. + +To not do this every time, the resulting `PathInfo` is saved in a local (`redb`) +database. + +This also showcases how PathInfo services can refer to other store types (blob +services, directory services). + +``` +[blobservices.root] +type = "objectstore" +object_store_url = "file:///var/lib/tvix-store/blobs.object_store" +object_store_options = {} + +[directoryservices.root] +type = "redb" +is_temporary = false +path = "/var/lib/tvix-store/directories.redb" + +[pathinfoservices.root] +type = "cache" +near = "redb" +far = "cache-nixos-org" + +[pathinfoservices.redb] +type = "redb" +is_temporary = false +path = "/var/lib/tvix-store/pathinfo.redb" + +[pathinfoservices.cache-nixos-org] +type = "nix" +base_url = "https://cache.nixos.org" +public_keys = ["cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY="] +blob_service = "root" +directory_service = "root" +``` |