about summary refs log tree commit diff
path: root/tvix/castore/src
diff options
context:
space:
mode:
authorFlorian Klink <flokli@flokli.de>2024-09-19T08·27+0300
committerclbot <clbot@tvl.fyi>2024-09-19T12·51+0000
commit21e5fc024d3ad275112c5bc88476ee38966d9fe1 (patch)
tree94d1e12524ba5d9497f2904934850890a4087443 /tvix/castore/src
parent1f5a20736af58045a8e009d12c3b809e87afefcd (diff)
fix(tvix/castore/import): check small blobs first r/8704
ConcurrentBlobUploader buffers small blobs in memory, and then uploads
them to the BlobService in the background.

In these cases, we know the hash of the whole blob, so we could check if
it exists first before, uploading it.

We were however not, and this caused rate limiting issues in GCS, as it
has an update limit of one write per second on the same key, which we
ran into especially frequently with the empty blob.

This reduces the amount of writes of the same blob considerably.

In the future, we might be able to drop this, as our chunked blob
uploading protocol gets smarter and covers these cases.

Change-Id: Icf482df815812f80a0b65cec0426f8e686308abb
Reviewed-on: https://cl.tvl.fyi/c/depot/+/12497
Tested-by: BuildkiteCI
Autosubmit: flokli <flokli@flokli.de>
Reviewed-by: Connor Brewster <cbrewster@hey.com>
Diffstat (limited to 'tvix/castore/src')
-rw-r--r--tvix/castore/src/import/blobs.rs13
1 files changed, 13 insertions, 0 deletions
diff --git a/tvix/castore/src/import/blobs.rs b/tvix/castore/src/import/blobs.rs
index 8135d871d6c0..f71ee1e63768 100644
--- a/tvix/castore/src/import/blobs.rs
+++ b/tvix/castore/src/import/blobs.rs
@@ -28,6 +28,9 @@ pub enum Error {
     #[error("unable to read blob contents for {0}: {1}")]
     BlobRead(PathBuf, std::io::Error),
 
+    #[error("unable to check whether blob at {0} already exists: {1}")]
+    BlobCheck(PathBuf, std::io::Error),
+
     // FUTUREWORK: proper error for blob finalize
     #[error("unable to finalize blob {0}: {1}")]
     BlobFinalize(PathBuf, std::io::Error),
@@ -118,6 +121,16 @@ where
                 let path = path.to_owned();
                 let r = Cursor::new(buffer);
                 async move {
+                    // We know the blob digest already, check it exists before sending it.
+                    if blob_service
+                        .has(&expected_digest)
+                        .await
+                        .map_err(|e| Error::BlobCheck(path.clone(), e))?
+                    {
+                        drop(permit);
+                        return Ok(());
+                    }
+
                     let digest = upload_blob(&blob_service, &path, expected_size, r).await?;
 
                     assert_eq!(digest, expected_digest, "Tvix bug: blob digest mismatch");