Skip to content

perf(worker): Bulk update expired flakes#828

Open
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/bulk-update-flakes-ZxeqFn
Open

perf(worker): Bulk update expired flakes#828
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/bulk-update-flakes-ZxeqFn

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry bot commented Apr 15, 2026

Fixes WORKER-Y7X. The issue was that: Individual Testrun queries per upload and Flake saves within loops cause N+1 database interactions.

  • Modified handle_pass to collect expired flakes into a list instead of saving them individually.
  • Introduced an expired_flakes list in process_testruns to store flakes that have met their expiration criteria.
  • Implemented a bulk update operation using Flake.objects.bulk_create with update_conflicts=True to efficiently update all expired flakes in a single database call.

This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13356808

Not quite right? Click here to continue debugging with Seer.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.


Note

Medium Risk
Changes how Flake expirations are persisted by deferring per-row saves into a bulk upsert, which could affect correctness if the bulk conflict/update semantics differ from the prior save() behavior.

Overview
Optimizes flake processing by deferring expiration writes: handle_pass now queues expired Flake rows instead of calling save() inside the loop.

After each upload is processed, queued expired flakes are persisted via a single Flake.objects.bulk_create(..., update_conflicts=True) upsert updating end_date, count, and recent_passes_count, reducing N+1 database interactions during flake expiration.

Reviewed by Cursor Bugbot for commit df54aa6. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit df54aa6. Configure here.

update_conflicts=True,
unique_fields=["id"],
update_fields=["end_date", "count", "recent_passes_count"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing fail_count in expired flakes bulk update

High Severity

The bulk_create for expired flakes omits "fail_count" from update_fields. A flake can have its fail_count incremented by handle_failure before later expiring through handle_pass. Since expired flakes are removed from curr_flakes via del, they won't be included in the final bulk_create at line 148 which does include "fail_count". The original code used .save() which persisted all fields. This causes silent data loss of fail_count updates for any flake that expires.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit df54aa6. Configure here.

update_conflicts=True,
unique_fields=["id"],
update_fields=["end_date", "count", "recent_passes_count"],
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The bulk_create for expired flakes omits fail_count from update_fields. This causes failure count updates that occur within the same upload to be lost upon flake expiry.
Severity: MEDIUM

Suggested Fix

Add the fail_count field to the update_fields list in the Flake.objects.bulk_create call for expired_flakes. This will ensure that any in-memory modifications to fail_count are correctly persisted to the database when a flake's record is updated upon expiry.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L114

Potential issue: When a flaky test expires (after 30 consecutive passes), the system
performs a `bulk_create` operation to update its record in the database. The
`update_fields` list for this operation is missing the `fail_count` field. If a test
fails and then subsequently passes 30 times within the same data processing upload, the
in-memory `fail_count` is incremented but this change is lost when writing to the
database. This results in incorrect historical data for the expired flake, as the final
failure count will not be persisted.

Did we get this right? 👍 / 👎 to inform future reviews.

@codecov-notifications
Copy link
Copy Markdown

codecov-notifications bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@sentry
Copy link
Copy Markdown
Contributor Author

sentry bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.25%. Comparing base (9eed0bb) to head (df54aa6).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #828   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files        1306     1306           
  Lines       48012    48015    +3     
  Branches     1636     1636           
=======================================
+ Hits        44294    44297    +3     
  Misses       3407     3407           
  Partials      311      311           
Flag Coverage Δ
workerintegration 58.53% <16.66%> (-0.02%) ⬇️
workerunit 90.39% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants