Skip to content

perf(worker): Bulk update ended flakes in test analytics#812

Open
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/bulk-update-flakes
Open

perf(worker): Bulk update ended flakes in test analytics#812
sentry[bot] wants to merge 1 commit intomainfrom
seer/perf/bulk-update-flakes

Conversation

@sentry
Copy link
Copy Markdown
Contributor

@sentry sentry bot commented Apr 14, 2026

Fixes WORKER-Y7H. The issue was that: ReportSession objects fetched without prefetching related report and commit objects, causing N+1 queries during iteration.

  • Refactored handle_pass to return an ended Flake object instead of saving it individually and deleting it from the dictionary.
  • Modified process_single_upload to collect all Flake objects that have met their ending criteria into a list.
  • Implemented a bulk_update operation in process_flakes_for_commit to efficiently update all ended flakes in a single database transaction.
  • This change significantly reduces the number of individual database write operations, improving the performance of flake processing.

This fix was generated by Seer in Sentry, triggered automatically. 👁️ Run ID: 13313490

Not quite right? Click here to continue debugging with Seer.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.


Note

Medium Risk
Touches flake lifecycle persistence logic and changes how/when Flake rows are written, which could affect correctness if the ended-vs-active set is mishandled; scope is contained to the test analytics worker.

Overview
Reduces per-test DB writes during flake processing by stopping individual Flake.save() calls when a flake “ends” and instead collecting ended flakes and persisting them via a single Flake.objects.bulk_update() in process_flakes_for_commit.

handle_pass now returns the ended Flake (popping it from curr_flakes) when the pass threshold is reached, process_single_upload aggregates these ended flakes, and the remaining active/new flakes continue to be upserted via bulk_create(..., update_conflicts=True).

Reviewed by Cursor Bugbot for commit a165ae1. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a165ae1. Configure here.

if ended_flakes:
Flake.objects.bulk_update(
ended_flakes, ["end_date", "count", "recent_passes_count"]
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing fail_count in ended flakes bulk update

High Severity

The bulk_update for ended_flakes updates ["end_date", "count", "recent_passes_count"] but omits "fail_count". A flake can have its fail_count incremented by handle_failure (which also resets recent_passes_count to 0) and then later end via 30 consecutive passes in handle_pass within the same commit processing. The original .save() call persisted all fields including fail_count, so this is a data-loss regression. The bulk_create for non-ended flakes correctly includes "fail_count" in its update_fields, making this an inconsistency.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a165ae1. Configure here.

if ended_flakes:
Flake.objects.bulk_update(
ended_flakes, ["end_date", "count", "recent_passes_count"]
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bulk update crashes on unsaved new flake objects

Medium Severity

handle_failure creates a new Flake instance (never saved, id=None) and puts it in curr_flakes. If this flake later accumulates 30 passes and ends in the same commit processing, it gets popped into ended_flakes. Calling bulk_update on objects with pk=None raises a ValueError. The original .save() handled this correctly by performing an INSERT. The flake is also no longer in curr_flakes, so bulk_create won't pick it up either.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a165ae1. Configure here.

Comment on lines +143 to +146
if ended_flakes:
Flake.objects.bulk_update(
ended_flakes, ["end_date", "count", "recent_passes_count"]
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: A new Flake object that expires in the same run is lost because bulk_update silently ignores objects with a null primary key, causing data loss.
Severity: HIGH

Suggested Fix

Separate the ended_flakes list into two groups: one for existing flakes with a database id and another for new flakes where id is None. Use Flake.objects.bulk_update() for the existing flakes and Flake.objects.bulk_create() for the new ones to ensure all are persisted correctly.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L143-L146

Potential issue: A newly created `Flake` object, which exists only in memory and has
`id=None`, will be permanently lost if it also expires within the same processing batch.
The logic adds the expired flake to the `ended_flakes` list for a `bulk_update`
operation. However, Django's `bulk_update` requires a primary key to identify which row
to update and silently ignores objects where `id=None`. Since the flake was also removed
from the list of current flakes, it is never persisted to the database, resulting in
data loss.

Did we get this right? 👍 / 👎 to inform future reviews.

@codecov-notifications
Copy link
Copy Markdown

codecov-notifications bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...orker/services/test_analytics/ta_process_flakes.py 92.30% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@sentry
Copy link
Copy Markdown
Contributor Author

sentry bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.25%. Comparing base (8c332b1) to head (a165ae1).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...orker/services/test_analytics/ta_process_flakes.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #812   +/-   ##
=======================================
  Coverage   92.25%   92.25%           
=======================================
  Files        1306     1306           
  Lines       48004    48011    +7     
  Branches     1636     1636           
=======================================
+ Hits        44286    44293    +7     
  Misses       3407     3407           
  Partials      311      311           
Flag Coverage Δ
workerintegration 58.55% <7.69%> (-0.03%) ⬇️
workerunit 90.39% <92.30%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants