diff options
author | Florian Klink <flokli@flokli.de> | 2024-08-18T16·17+0300 |
---|---|---|
committer | flokli <flokli@flokli.de> | 2024-08-19T10·07+0000 |
commit | bb5d7c96783656a5c1be4bb93914032f40c37ab4 (patch) | |
tree | b52139c28ae7accd31eff1c2006d0a50bf0f0d7c | |
parent | 98863e731221e3cd70f88b0f1e6646d958676e6c (diff) |
feat(ops/pipelines): support buildkite retries r/8517
cl/12228 did enable automatic retries for some flaky tests, which generally did work, as can be seen in https://buildkite.com/tvl/depot/builds/35893 However, ":duck:" still reports as failing, because we check the number of steps to be nonzero, which is not the case if retries have happened. We cannot check for the overall status of the build, as it's still "RUNNING", but instead of counting all failed steps so far, we can query all failed jobs and then filter out the ones that were already retried. Change-Id: Ib9d27587c8a8ba7970850812c4302fecdc4482e7 Reviewed-on: https://cl.tvl.fyi/c/depot/+/12233 Tested-by: BuildkiteCI Reviewed-by: tazjin <tazjin@tvl.su>
-rw-r--r-- | ops/pipelines/static-pipeline.yaml | 10 |
1 files changed, 6 insertions, 4 deletions
diff --git a/ops/pipelines/static-pipeline.yaml b/ops/pipelines/static-pipeline.yaml index af4f9d784e60..a7eea2eb97d8 100644 --- a/ops/pipelines/static-pipeline.yaml +++ b/ops/pipelines/static-pipeline.yaml @@ -88,10 +88,12 @@ steps: continue_on_failure: true # Exit with success or failure depending on whether any other steps - # failed. + # failed (but not retried). # # This information is checked by querying the Buildkite GraphQL API - # and fetching the count of failed steps. + # and fetching all failed steps, then filtering out the ones that were + # retried (retried jobs create new jobs, which would also show up in the + # query). # # This step must be :duck: (yes, really!) because the post-command # hook will inspect this name. @@ -109,8 +111,8 @@ steps: readonly FAILED_JOBS=$(curl 'https://graphql.buildkite.com/v1' \ --silent \ -H "Authorization: Bearer $(cat ${BUILDKITE_TOKEN_PATH})" \ - -d "{\"query\": \"query BuildStatusQuery { build(uuid: \\\"$BUILDKITE_BUILD_ID\\\") { jobs(passed: false) { count } } }\"}" | \ - jq -r '.data.build.jobs.count') + -d "{\"query\": \"query BuildStatusQuery { build(uuid: \\\"$BUILDKITE_BUILD_ID\\\") { jobs(passed: false, first: 500 ) { edges { node { ... on JobTypeCommand { retried } } } } } }\"}" | \ + jq -r '.data.build.jobs.edges | map(select(.node.retried == false)) | length') echo "$$FAILED_JOBS build jobs failed." |