Improve resumable single-step eval by kmaziarz · Pull Request #155 · microsoft/syntheseus

kmaziarz · 2026-06-10T13:11:48Z

This PR continues after #149 by improving the robustness of single-step eval, especially in cases when the storage backend does not save things immediately (e.g. mounted remote container).

The most important change is that the current version of resumable evals rewrites the results file upon a restart, which is potentially a very heavy operation, and one that can take a minute or so to get flushed to storage. I saw cases where results from the next couple of batches would be lost because of a race condition between the file rewrite and subsequent appends. This PR proposes to instead truncate the file to remove the potential broken lines instead of fully rewriting it. Moreover, I found that reopening the results file after each batch instead of keeping it open throughout can lead to more frequent flushing to underlying storage. Finally, if running in resumable mode, all_predictions and all_back_translation_predictions were keeping the results in memory unnecessarily, as there is no downstream consumer of that data and it is already stored in the results file itself; this is also cleaned up here.

…er needed

jla-gardner

LGTM!

kmaziarz added 4 commits June 10, 2026 12:07

fix(eval_single_step): Truncate results file instead of rewriting

efa5c0c

fix(eval_single_step): Reopen results file on each write

a6a2e40

fix(eval_single_step): Free memory used by valid_records when no long…

5f26acb

…er needed

feat(eval_single_step): Do not keep predictions in memory unnecessarily

e53b180

kmaziarz requested a review from jla-gardner June 10, 2026 13:11

fix(test_eval_single_step): Adapt test

907506c

jla-gardner approved these changes Jun 17, 2026

View reviewed changes

doc(CHANGELOG): Add an entry for #155

c312356

kmaziarz enabled auto-merge (squash) June 17, 2026 16:48

kmaziarz merged commit e7b1702 into main Jun 17, 2026
10 checks passed

kmaziarz deleted the kmaziarz/improve-resumable-single-step-eval branch June 17, 2026 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve resumable single-step eval#155

Improve resumable single-step eval#155
kmaziarz merged 6 commits into
mainfrom
kmaziarz/improve-resumable-single-step-eval

kmaziarz commented Jun 10, 2026

Uh oh!

jla-gardner left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kmaziarz commented Jun 10, 2026

Uh oh!

jla-gardner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants