Skip to content

Improve resumable single-step eval#155

Merged
kmaziarz merged 6 commits into
mainfrom
kmaziarz/improve-resumable-single-step-eval
Jun 17, 2026
Merged

Improve resumable single-step eval#155
kmaziarz merged 6 commits into
mainfrom
kmaziarz/improve-resumable-single-step-eval

Conversation

@kmaziarz

Copy link
Copy Markdown
Contributor

This PR continues after #149 by improving the robustness of single-step eval, especially in cases when the storage backend does not save things immediately (e.g. mounted remote container).

The most important change is that the current version of resumable evals rewrites the results file upon a restart, which is potentially a very heavy operation, and one that can take a minute or so to get flushed to storage. I saw cases where results from the next couple of batches would be lost because of a race condition between the file rewrite and subsequent appends. This PR proposes to instead truncate the file to remove the potential broken lines instead of fully rewriting it. Moreover, I found that reopening the results file after each batch instead of keeping it open throughout can lead to more frequent flushing to underlying storage. Finally, if running in resumable mode, all_predictions and all_back_translation_predictions were keeping the results in memory unnecessarily, as there is no downstream consumer of that data and it is already stored in the results file itself; this is also cleaned up here.

@kmaziarz kmaziarz requested a review from jla-gardner June 10, 2026 13:11

@jla-gardner jla-gardner left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kmaziarz kmaziarz enabled auto-merge (squash) June 17, 2026 16:48
@kmaziarz kmaziarz merged commit e7b1702 into main Jun 17, 2026
10 checks passed
@kmaziarz kmaziarz deleted the kmaziarz/improve-resumable-single-step-eval branch June 17, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants