about summary refs log tree commit diff
path: root/tvix/castore/src/import
AgeCommit message (Collapse)AuthorFilesLines
2024-10-01 r/8741 feat(castore/fs): optional refscanner for ingestYureka1-12/+40
Change-Id: Ieca06de4c2e2680d89fe05a380079fafa5454837 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12529 Autosubmit: yuka <yuka@yuka.dev> Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-09-19 r/8704 fix(tvix/castore/import): check small blobs firstFlorian Klink1-0/+13
ConcurrentBlobUploader buffers small blobs in memory, and then uploads them to the BlobService in the background. In these cases, we know the hash of the whole blob, so we could check if it exists first before, uploading it. We were however not, and this caused rate limiting issues in GCS, as it has an update limit of one write per second on the same key, which we ran into especially frequently with the empty blob. This reduces the amount of writes of the same blob considerably. In the future, we might be able to drop this, as our chunked blob uploading protocol gets smarter and covers these cases. Change-Id: Icf482df815812f80a0b65cec0426f8e686308abb Reviewed-on: https://cl.tvl.fyi/c/depot/+/12497 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-08-18 r/8514 refactor(tvix/castore): have SymlinkTarget-specific errorsFlorian Klink1-3/+3
Don't use ValidateNodeError, but SymlinkTargetError. Also, add checks for too long symlink targets. Change-Id: I4b533325d494232ff9d0b3f4f695f5a1a0a36199 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12230 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: edef <edef@edef.eu> Reviewed-by: Ilan Joselevich <personal@ilanjoselevich.com> Tested-by: BuildkiteCI
2024-08-17 r/8506 refactor(tvix/castore): add PathComponent type for checked componentsFlorian Klink1-3/+2
This encodes a verified component on the type level. Internally, it contains a bytes::Bytes. The castore Path/PathBuf component() and file_name() methods now return this type, the old ones returning bytes were renamed to component_bytes() and component_file_name() respectively. We can drop the directory_reject_invalid_name test - it's not possible anymore to pass an invalid name to Directories::add. Invalid names in the Directory proto are still being tested to be rejected in the validate_invalid_names tests. Change-Id: Ide4d16415dfd50b7e2d7e0c36d42a3bbeeb9b6c5 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12217 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-08-17 r/8505 refactor(tvix/castore): drop {Directory,File,Symlink}NodeFlorian Klink1-19/+28
Add a `SymlinkTarget` type to represent validated symlink targets. With this, no invalid states are representable, so we can make `Node` be just an enum of all three kind of types, and allow access to these fields directly. Change-Id: I20bdd480c8d5e64a827649f303c97023b7e390f2 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12216 Reviewed-by: benjaminedwardwebb <benjaminedwardwebb@gmail.com> Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-08-17 r/8504 refactor(tvix/castore): remove `name` from NodesFlorian Klink1-34/+20
Nodes only have names if they're contained inside a Directory, or if they're a root node and have something else possibly giving them a name externally. This removes all `name` fields in the three different Nodes, and instead maintains it inside a BTreeMap inside the Directory. It also removes the NamedNode trait (they don't have a get_name()), as well as Node::rename(self, name), and all [Partial]Ord implementations for Node (as they don't have names to use for sorting). The `nodes()`, `directories()`, `files()` iterators inside a `Directory` now return a tuple of Name and Node, as does the RootNodesProvider. The different {Directory,File,Symlink}Node struct constructors got simpler, and the {Directory,File}Node ones became infallible - as there's no more possibility to represent invalid state. The proto structs stayed the same - there's now from_name_and_node and into_name_and_node to convert back and forth between the two `Node` structs. Some further cleanups: The error types for Node validation were renamed. Everything related to names is now in the DirectoryError (not yet happy about the naming) There's some leftover cleanups to do: - There should be a from_(sorted_)iter and into_iter in Directory, so we can construct and deconstruct in one go. That should also enable us to implement conversions from and to the proto representation that moves, rather than clones. - The BuildRequest and PathInfo structs are still proto-based, so we still do a bunch of conversions back and forth there (and have some ugly expect there). There's not much point for error handling here, this will be moved to stricter types in a followup CL. Change-Id: I7369a8e3a426f44419c349077cb4fcab2044ebb6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12205 Tested-by: BuildkiteCI Reviewed-by: yuka <yuka@yuka.dev> Autosubmit: flokli <flokli@flokli.de> Reviewed-by: benjaminedwardwebb <benjaminedwardwebb@gmail.com> Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-08-13 r/8486 refactor(tvix/castore): move *Node and Directory to crate rootFlorian Klink3-12/+7
*Node and Directory are types of the tvix-castore model, not the tvix DirectoryService model. A DirectoryService only happens to send Directories. Move types into individual files in a nodes/ subdirectory, as it's gotten too cluttered in a single file, and (re-)export all types from the crate root. This has the effect that we now cannot poke at private fields directly from other files inside `crate::directoryservice` (as it's not all in the same file anymore), but that's a good thing, it now forces us to go through the proper accessors. For the same reasons, we currently also need to introduce the `rename` functions on each *Node directly. A followup is gonna move the names out of the individual enum kinds, so we can better represent "unnamed nodes". Change-Id: Icdb34dcfe454c41c94f2396e8e99973d27db8418 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12199 Reviewed-by: yuka <yuka@yuka.dev> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-08-13 r/8484 refactor(tvix/castore): use Directory struct separate from proto oneYureka3-41/+49
This uses our own data type to deal with Directories in the castore model. It makes some undesired states unrepresentable, removing the need for conversions and checking in various places: - In the protobuf, blake3 digests could have a wrong length, as proto doesn't know fixed-size fields. We now use `B3Digest`, which makes cloning cheaper, and removes the need to do size-checking everywhere. - In the protobuf, we had three different lists for `files`, `symlinks` and `directories`. This was mostly a protobuf size optimization, but made interacting with them a bit awkward. This has now been replaced with a list of enums, and convenience iterators to get various nodes, and add new ones. Change-Id: I7b92691bb06d77ff3f58a5ccea94a22c16f84f04 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12057 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-06-17 r/8289 feat(tvix/tvix-store): improve progress barsFlorian Klink2-22/+36
Don't show an empty spinner for daemon commands. Move the bar to the right, so the text is better aligned between spinner progress and bar progress styles. Generally, push progress bars a bit more down to the place where we can track progress. This includes adding one in the upload_blob span. Introduce another progress style template for transfers, which interprets the counter as bytes (not just a plain integer), and also a data rate. Use it for here and in the fetching code, and also make the progress bar itself a bit less wide. Change-Id: I15c2ea3d2b24b5186cec19cd3dbd706638497f40 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11845 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Simon Hauser <simon.hauser@helsinki-systems.de>
2024-06-06 r/8220 feat(tvix/store/bin): add progress bar infrastructureFlorian Klink2-2/+19
This adds the tracing-indicatif crate, and configures it as a layer in our tracing_subscriber pipeline to emit progress for every span that's configured so. It also moves from using std::io::stderr to write logs to using their writer, to avoid clobbering output. Progress bar styles are defined in a lazy_static, moving this into a general tracing is left for later. This adds some usage of this to the `imports` and `copy` commands. The output can still be improved a bit - we should probably split each task up into a smaller (instrumented) helper functions, so we can create a progress bar for each task. Change-Id: I59a1915aa4e0caa89c911632dec59c4cbeba1b89 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11747 Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: Simon Hauser <simon.hauser@helsinki-systems.de> Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de>
2024-05-20 r/8158 refactor(tvix/castore): extract concurrent blob uploaderConnor Brewster3-97/+190
The archive ingester has a mechanism for concurrently uploading small blobs to the blob service in order to hide round trip latency with the blob service when ingesting many small blobs. Other ingestion sources like NARs also need a similar mechanism, this extracts the concurrent blob uploading mechanism into its own struct to make it more reusable. Change-Id: I05020419ff4b9ad5829fbfb5cd08d36db983b8c0 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11693 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-05-06 r/8079 test(tvix-castore/import): add tests for ingest_entriesFlorian Klink1-3/+138
Change-Id: Ia7906533868fd948509419e0d64b64582575a7fa Reviewed-on: https://cl.tvl.fyi/c/depot/+/11591 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-06 r/8078 fix(tvix/castore/import): assert end of streamFlorian Klink1-0/+5
Once we break out with the root node, there may be no more elements in the stream. Change-Id: I6f5fc5662095aa2b2a56bcad506d25520d9ad00c Reviewed-on: https://cl.tvl.fyi/c/depot/+/11592 Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de>
2024-05-06 r/8077 fix(tvix/castore/import): deal with entry.path() not having a parentFlorian Klink1-7/+9
We got away with not properly dealing with this for the archive case, where everything is contained inside a toplevel dir, but NARs can encode a single file/symlink. Properly break if the IngestionEntry path has the ROOT as parent, and only create filling directories in the other case. Change-Id: Ib378d0d1040de7c3fe310912a0b0488c55afee83 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11590 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-05 r/8076 feat(tvix-castore/import) have IngestionEntry.path() return &PathFlorian Klink2-3/+4
There's no need for this to be a &PathBuf. Change-Id: I2d4126d57cfd8ddaad5dd327943b70b83d45c749 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11589 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-04 r/8073 refactor(tvix/*store): use DS: DirectoryServiceFlorian Klink3-4/+4
We implement DirectoryService for Arc<DirectoryService> and Box<DirectoryService>, this is sufficient. Change-Id: I0a5a81cbc4782764406b5bca57f908ace6090737 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11586 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-02 r/8067 refactor(tvix/castore/import): use crate Path[Buf] in IngestionEntryFlorian Klink4-69/+74
This explicitly splits ingestion-method-specific path types from the castore types. Change-Id: Ia3b16105fadb8d52927a4ed79dc4b34efdf4311b Reviewed-on: https://cl.tvl.fyi/c/depot/+/11563 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-04-30 r/8048 refactor(tvix/castore/import): restructure error typesFlorian Klink4-65/+119
Have ingest_entries return an Error type with only three kinds: - Error while uploading a specific Directory - Error while finalizing the directory upload - Error from the producer Move all ingestion method-specific errors to the individual implementations. Change-Id: I2a015cb7ebc96d084cbe2b809f40d1b53a15daf3 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11557 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-30 r/8047 refactor(tvix/castore): remove IngestionEntry::UnknownFlorian Klink2-10/+1
We shouldn't try to represent non-representable things in the ingestion entries (only to throw an error). It's cleaner to throw the error directly in the part producing the stream. Change-Id: I6b6f6d8c2f677425210142a39f1829ddeefec812 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11556 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: firefly <firefly@firefly.nu>
2024-04-30 r/8046 refactor(tvix/castore/import): move upload_blob_at_path into fs modFlorian Klink2-28/+27
This is only useful for when we have access to a filesystem, so it shouldn't be in the root. Change-Id: I9923aaed1aef9d3a1e8fad41f58821d51c2eb34b Reviewed-on: https://cl.tvl.fyi/c/depot/+/11555 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: firefly <firefly@firefly.nu> Tested-by: BuildkiteCI
2024-04-30 r/8045 fix(tvix/castore/import): symlink targets are Vec<u8>Florian Klink3-4/+9
These can be arbitrary bytes in theory. Some of our libraries might be more strict, or inconsistent w.r.t. their representation of path separators. Change-Id: I7981b74fc7d3dd79f5589cf2ef52ced7b71dd003 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11551 Tested-by: BuildkiteCI Reviewed-by: edef <edef@edef.eu>
2024-04-30 r/8044 docs(tvix/castore): fix tvix_castore::import sub-mod docstringsFlorian Klink2-2/+4
The one for `fs` was wrong, and ended up being attached to ingest_path, and the one for `archive` was missing entirely. Change-Id: I8a4c32fb5293badb1ea0764c278a88e4ca33c018 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11552 Tested-by: BuildkiteCI Reviewed-by: edef <edef@edef.eu>
2024-04-24 r/8002 refactor(tvix/castore): add separate Error enum for archivesConnor Brewster2-33/+34
The `Error` enum for the `imports` crate has both filesystem and archive specific errors and was starting to get messy. This adds a separate `Error` enum for archive-specific errors and then keeps a single `Archive` variant in the top-level import `Error` for all archive errors. Change-Id: I4cd0746c864e5ec50b1aa68c0630ef9cd05176c7 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11498 Tested-by: BuildkiteCI Autosubmit: Connor Brewster <cbrewster@hey.com> Reviewed-by: flokli <flokli@flokli.de>
2024-04-23 r/8001 feat(tvix/castore): upload blobs concurrently when ingesting archivesConnor Brewster1-10/+90
Ingesting tarballs with a lot of small files is very slow because of the round trip time to the `BlobService`. To mitigate this, small blobs can be buffered into memory and uploaded concurrently in the background. Change-Id: I3376d11bb941ae35377a089b96849294c9c139e6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11497 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Autosubmit: Connor Brewster <cbrewster@hey.com>
2024-04-23 r/8000 refactor(tvix/castore): switch to `ingest_entries` for tarball ingestionConnor Brewster3-128/+239
With `ingest_entries` being more generalized, we can now use it for ingesting the directory entries generated from tarballs. Change-Id: Ie1f7a915c456045762e05fcc9af45771f121eb43 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11489 Reviewed-by: flokli <flokli@flokli.de> Autosubmit: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-04-20 r/7987 refactor(tvix/castore): ingest filesystem entries in parallelFlorian Klink2-55/+41
Rather than carrying around an Future in the IngestionEntry::Regular, simply carry the plain B3Digest. Code reading through a non-seekable data stream has no choice but to read and upload blobs immediately, and code seeking through something seekable (like a filesystem) probably knows better what concurrency to pick when ingesting, rather than the consuming side. (Our only) one of these seekable source implementations is now doing exactly that. We produce a stream of futures, and then use [StreamExt::buffered] to process more than one, concurrently. We still keep the same order, to avoid shuffling things and violating the stream order. This also cleans up walk_path_for_ingestion in castore/import, as well as ingest_dir_entries in glue/tvix_store_io. Change-Id: I5eb70f3e1e372c74bcbfcf6b6e2653eba36e151d Reviewed-on: https://cl.tvl.fyi/c/depot/+/11491 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-04-20 r/7985 feat(tvix/castore): Fix build warnings in release modeConnor Brewster1-0/+1
Fixes some build warnings that only happen when building in release mode which disables `debug_assertions`. Change-Id: I554d5fce7c869c23cf4aa93179f0ee9f7f7c834e Reviewed-on: https://cl.tvl.fyi/c/depot/+/11490 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI Autosubmit: Connor Brewster <cbrewster@hey.com> Reviewed-by: flokli <flokli@flokli.de>
2024-04-20 r/7984 fix(tvix/castore): ensure all directories are present during ingestionConnor Brewster1-0/+8
`ingest_entries` requires that all directories referenced by entries in the ingestion stream have an explicit entry in the stream. For example, if the stream contains a file with path `foo/bar`, there must be an entry that comes later in the stream for the directory `foo`. Change-Id: I61b4fbbb73ea7278715e04271d8073b484e05e61 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11488 Autosubmit: Connor Brewster <cbrewster@hey.com> Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-04-20 r/7983 feat(tvix/eval): Implement builtins.fetchTarballAspen Smith3-2/+212
Implement a first pass at the fetchTarball builtin. This uses much of the same machinery as fetchUrl, but has the extra complexity that tarballs have to be extracted and imported as store paths (into the directory- and blob-services) before hashing. That's reasonably involved due to the structure of those two services. This is (unfortunately) not easy to test in an automated way, but I've tested it manually for now and it seems to work: tvix-repl> (import ../. {}).third_party.nixpkgs.hello.outPath => "/nix/store/dbghhbq1x39yxgkv3vkgfwbxrmw9nfzi-hello-2.12.1" :: string Co-authored-by: Connor Brewster <cbrewster@hey.com> Change-Id: I57afc6b91bad617a608a35bb357861e782a864c8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11020 Autosubmit: aspen <root@gws.fyi> Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-04-20 r/7982 feat(tvix/castore/import): only allow normal components in entry pathsFlorian Klink1-1/+10
Explicitly document and add a debug assertion for that. It's up to callers to ensure this doesn't happen. Change-Id: Ib5d154809c2ad2920258e239993d0b790d846dc8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11487 Reviewed-by: Connor Brewster <cbrewster@hey.com> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-04-20 r/7981 refactor(tvix/castore/import): make module, split off fs and errorFlorian Klink3-0/+398
Move error types and filesystem-specific functions to a separate file, and keep the fs:: namespace in public exports. Change-Id: I5e9e83ad78d9aea38553fafc293d3e4f8c31a8c1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11486 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com> Autosubmit: flokli <flokli@flokli.de>