about summary refs log tree commit diff
path: root/tvix/castore/src/import
AgeCommit message (Collapse)AuthorFilesLines
2024-06-06 r/8220 feat(tvix/store/bin): add progress bar infrastructureFlorian Klink2-2/+19
This adds the tracing-indicatif crate, and configures it as a layer in our tracing_subscriber pipeline to emit progress for every span that's configured so. It also moves from using std::io::stderr to write logs to using their writer, to avoid clobbering output. Progress bar styles are defined in a lazy_static, moving this into a general tracing is left for later. This adds some usage of this to the `imports` and `copy` commands. The output can still be improved a bit - we should probably split each task up into a smaller (instrumented) helper functions, so we can create a progress bar for each task. Change-Id: I59a1915aa4e0caa89c911632dec59c4cbeba1b89 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11747 Reviewed-by: flokli <flokli@flokli.de> Reviewed-by: Simon Hauser <simon.hauser@helsinki-systems.de> Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de>
2024-05-20 r/8158 refactor(tvix/castore): extract concurrent blob uploaderConnor Brewster3-97/+190
The archive ingester has a mechanism for concurrently uploading small blobs to the blob service in order to hide round trip latency with the blob service when ingesting many small blobs. Other ingestion sources like NARs also need a similar mechanism, this extracts the concurrent blob uploading mechanism into its own struct to make it more reusable. Change-Id: I05020419ff4b9ad5829fbfb5cd08d36db983b8c0 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11693 Tested-by: BuildkiteCI Reviewed-by: flokli <flokli@flokli.de>
2024-05-06 r/8079 test(tvix-castore/import): add tests for ingest_entriesFlorian Klink1-3/+138
Change-Id: Ia7906533868fd948509419e0d64b64582575a7fa Reviewed-on: https://cl.tvl.fyi/c/depot/+/11591 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-06 r/8078 fix(tvix/castore/import): assert end of streamFlorian Klink1-0/+5
Once we break out with the root node, there may be no more elements in the stream. Change-Id: I6f5fc5662095aa2b2a56bcad506d25520d9ad00c Reviewed-on: https://cl.tvl.fyi/c/depot/+/11592 Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de>
2024-05-06 r/8077 fix(tvix/castore/import): deal with entry.path() not having a parentFlorian Klink1-7/+9
We got away with not properly dealing with this for the archive case, where everything is contained inside a toplevel dir, but NARs can encode a single file/symlink. Properly break if the IngestionEntry path has the ROOT as parent, and only create filling directories in the other case. Change-Id: Ib378d0d1040de7c3fe310912a0b0488c55afee83 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11590 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-05 r/8076 feat(tvix-castore/import) have IngestionEntry.path() return &PathFlorian Klink2-3/+4
There's no need for this to be a &PathBuf. Change-Id: I2d4126d57cfd8ddaad5dd327943b70b83d45c749 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11589 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-04 r/8073 refactor(tvix/*store): use DS: DirectoryServiceFlorian Klink3-4/+4
We implement DirectoryService for Arc<DirectoryService> and Box<DirectoryService>, this is sufficient. Change-Id: I0a5a81cbc4782764406b5bca57f908ace6090737 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11586 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-05-02 r/8067 refactor(tvix/castore/import): use crate Path[Buf] in IngestionEntryFlorian Klink4-69/+74
This explicitly splits ingestion-method-specific path types from the castore types. Change-Id: Ia3b16105fadb8d52927a4ed79dc4b34efdf4311b Reviewed-on: https://cl.tvl.fyi/c/depot/+/11563 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-04-30 r/8048 refactor(tvix/castore/import): restructure error typesFlorian Klink4-65/+119
Have ingest_entries return an Error type with only three kinds: - Error while uploading a specific Directory - Error while finalizing the directory upload - Error from the producer Move all ingestion method-specific errors to the individual implementations. Change-Id: I2a015cb7ebc96d084cbe2b809f40d1b53a15daf3 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11557 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com>
2024-04-30 r/8047 refactor(tvix/castore): remove IngestionEntry::UnknownFlorian Klink2-10/+1
We shouldn't try to represent non-representable things in the ingestion entries (only to throw an error). It's cleaner to throw the error directly in the part producing the stream. Change-Id: I6b6f6d8c2f677425210142a39f1829ddeefec812 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11556 Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: firefly <firefly@firefly.nu>
2024-04-30 r/8046 refactor(tvix/castore/import): move upload_blob_at_path into fs modFlorian Klink2-28/+27
This is only useful for when we have access to a filesystem, so it shouldn't be in the root. Change-Id: I9923aaed1aef9d3a1e8fad41f58821d51c2eb34b Reviewed-on: https://cl.tvl.fyi/c/depot/+/11555 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: firefly <firefly@firefly.nu> Tested-by: BuildkiteCI
2024-04-30 r/8045 fix(tvix/castore/import): symlink targets are Vec<u8>Florian Klink3-4/+9
These can be arbitrary bytes in theory. Some of our libraries might be more strict, or inconsistent w.r.t. their representation of path separators. Change-Id: I7981b74fc7d3dd79f5589cf2ef52ced7b71dd003 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11551 Tested-by: BuildkiteCI Reviewed-by: edef <edef@edef.eu>
2024-04-30 r/8044 docs(tvix/castore): fix tvix_castore::import sub-mod docstringsFlorian Klink2-2/+4
The one for `fs` was wrong, and ended up being attached to ingest_path, and the one for `archive` was missing entirely. Change-Id: I8a4c32fb5293badb1ea0764c278a88e4ca33c018 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11552 Tested-by: BuildkiteCI Reviewed-by: edef <edef@edef.eu>
2024-04-24 r/8002 refactor(tvix/castore): add separate Error enum for archivesConnor Brewster2-33/+34
The `Error` enum for the `imports` crate has both filesystem and archive specific errors and was starting to get messy. This adds a separate `Error` enum for archive-specific errors and then keeps a single `Archive` variant in the top-level import `Error` for all archive errors. Change-Id: I4cd0746c864e5ec50b1aa68c0630ef9cd05176c7 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11498 Tested-by: BuildkiteCI Autosubmit: Connor Brewster <cbrewster@hey.com> Reviewed-by: flokli <flokli@flokli.de>
2024-04-23 r/8001 feat(tvix/castore): upload blobs concurrently when ingesting archivesConnor Brewster1-10/+90
Ingesting tarballs with a lot of small files is very slow because of the round trip time to the `BlobService`. To mitigate this, small blobs can be buffered into memory and uploaded concurrently in the background. Change-Id: I3376d11bb941ae35377a089b96849294c9c139e6 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11497 Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Autosubmit: Connor Brewster <cbrewster@hey.com>
2024-04-23 r/8000 refactor(tvix/castore): switch to `ingest_entries` for tarball ingestionConnor Brewster3-128/+239
With `ingest_entries` being more generalized, we can now use it for ingesting the directory entries generated from tarballs. Change-Id: Ie1f7a915c456045762e05fcc9af45771f121eb43 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11489 Reviewed-by: flokli <flokli@flokli.de> Autosubmit: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-04-20 r/7987 refactor(tvix/castore): ingest filesystem entries in parallelFlorian Klink2-55/+41
Rather than carrying around an Future in the IngestionEntry::Regular, simply carry the plain B3Digest. Code reading through a non-seekable data stream has no choice but to read and upload blobs immediately, and code seeking through something seekable (like a filesystem) probably knows better what concurrency to pick when ingesting, rather than the consuming side. (Our only) one of these seekable source implementations is now doing exactly that. We produce a stream of futures, and then use [StreamExt::buffered] to process more than one, concurrently. We still keep the same order, to avoid shuffling things and violating the stream order. This also cleans up walk_path_for_ingestion in castore/import, as well as ingest_dir_entries in glue/tvix_store_io. Change-Id: I5eb70f3e1e372c74bcbfcf6b6e2653eba36e151d Reviewed-on: https://cl.tvl.fyi/c/depot/+/11491 Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com> Tested-by: BuildkiteCI
2024-04-20 r/7985 feat(tvix/castore): Fix build warnings in release modeConnor Brewster1-0/+1
Fixes some build warnings that only happen when building in release mode which disables `debug_assertions`. Change-Id: I554d5fce7c869c23cf4aa93179f0ee9f7f7c834e Reviewed-on: https://cl.tvl.fyi/c/depot/+/11490 Reviewed-by: raitobezarius <tvl@lahfa.xyz> Tested-by: BuildkiteCI Autosubmit: Connor Brewster <cbrewster@hey.com> Reviewed-by: flokli <flokli@flokli.de>
2024-04-20 r/7984 fix(tvix/castore): ensure all directories are present during ingestionConnor Brewster1-0/+8
`ingest_entries` requires that all directories referenced by entries in the ingestion stream have an explicit entry in the stream. For example, if the stream contains a file with path `foo/bar`, there must be an entry that comes later in the stream for the directory `foo`. Change-Id: I61b4fbbb73ea7278715e04271d8073b484e05e61 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11488 Autosubmit: Connor Brewster <cbrewster@hey.com> Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI Reviewed-by: raitobezarius <tvl@lahfa.xyz>
2024-04-20 r/7983 feat(tvix/eval): Implement builtins.fetchTarballAspen Smith3-2/+212
Implement a first pass at the fetchTarball builtin. This uses much of the same machinery as fetchUrl, but has the extra complexity that tarballs have to be extracted and imported as store paths (into the directory- and blob-services) before hashing. That's reasonably involved due to the structure of those two services. This is (unfortunately) not easy to test in an automated way, but I've tested it manually for now and it seems to work: tvix-repl> (import ../. {}).third_party.nixpkgs.hello.outPath => "/nix/store/dbghhbq1x39yxgkv3vkgfwbxrmw9nfzi-hello-2.12.1" :: string Co-authored-by: Connor Brewster <cbrewster@hey.com> Change-Id: I57afc6b91bad617a608a35bb357861e782a864c8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11020 Autosubmit: aspen <root@gws.fyi> Reviewed-by: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-04-20 r/7982 feat(tvix/castore/import): only allow normal components in entry pathsFlorian Klink1-1/+10
Explicitly document and add a debug assertion for that. It's up to callers to ensure this doesn't happen. Change-Id: Ib5d154809c2ad2920258e239993d0b790d846dc8 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11487 Reviewed-by: Connor Brewster <cbrewster@hey.com> Autosubmit: flokli <flokli@flokli.de> Tested-by: BuildkiteCI
2024-04-20 r/7981 refactor(tvix/castore/import): make module, split off fs and errorFlorian Klink3-0/+398
Move error types and filesystem-specific functions to a separate file, and keep the fs:: namespace in public exports. Change-Id: I5e9e83ad78d9aea38553fafc293d3e4f8c31a8c1 Reviewed-on: https://cl.tvl.fyi/c/depot/+/11486 Tested-by: BuildkiteCI Reviewed-by: Connor Brewster <cbrewster@hey.com> Autosubmit: flokli <flokli@flokli.de>