diff options
author | Florian Klink <flokli@flokli.de> | 2024-09-19T08·27+0300 |
---|---|---|
committer | clbot <clbot@tvl.fyi> | 2024-09-19T12·51+0000 |
commit | 21e5fc024d3ad275112c5bc88476ee38966d9fe1 (patch) | |
tree | 94d1e12524ba5d9497f2904934850890a4087443 /tvix/castore/src/import/blobs.rs | |
parent | 1f5a20736af58045a8e009d12c3b809e87afefcd (diff) |
fix(tvix/castore/import): check small blobs first r/8704
ConcurrentBlobUploader buffers small blobs in memory, and then uploads them to the BlobService in the background. In these cases, we know the hash of the whole blob, so we could check if it exists first before, uploading it. We were however not, and this caused rate limiting issues in GCS, as it has an update limit of one write per second on the same key, which we ran into especially frequently with the empty blob. This reduces the amount of writes of the same blob considerably. In the future, we might be able to drop this, as our chunked blob uploading protocol gets smarter and covers these cases. Change-Id: Icf482df815812f80a0b65cec0426f8e686308abb Reviewed-on: https://cl.tvl.fyi/c/depot/+/12497 Tested-by: BuildkiteCI Autosubmit: flokli <flokli@flokli.de> Reviewed-by: Connor Brewster <cbrewster@hey.com>
Diffstat (limited to 'tvix/castore/src/import/blobs.rs')
-rw-r--r-- | tvix/castore/src/import/blobs.rs | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/tvix/castore/src/import/blobs.rs b/tvix/castore/src/import/blobs.rs index 8135d871d6c0..f71ee1e63768 100644 --- a/tvix/castore/src/import/blobs.rs +++ b/tvix/castore/src/import/blobs.rs @@ -28,6 +28,9 @@ pub enum Error { #[error("unable to read blob contents for {0}: {1}")] BlobRead(PathBuf, std::io::Error), + #[error("unable to check whether blob at {0} already exists: {1}")] + BlobCheck(PathBuf, std::io::Error), + // FUTUREWORK: proper error for blob finalize #[error("unable to finalize blob {0}: {1}")] BlobFinalize(PathBuf, std::io::Error), @@ -118,6 +121,16 @@ where let path = path.to_owned(); let r = Cursor::new(buffer); async move { + // We know the blob digest already, check it exists before sending it. + if blob_service + .has(&expected_digest) + .await + .map_err(|e| Error::BlobCheck(path.clone(), e))? + { + drop(permit); + return Ok(()); + } + let digest = upload_blob(&blob_service, &path, expected_size, r).await?; assert_eq!(digest, expected_digest, "Tvix bug: blob digest mismatch"); |