V1.3: Sync Retry with Offset Checking & Not-Fitting Subtitle Deletion by bridgemill-ch · Pull Request #51 · johnpc/subsyncarr

bridgemill-ch · 2026-05-13T15:08:23Z

Sync Retry with Offset Checking & Not-Fitting Subtitle Deletion

Summary

Adds post-sync offset validation to detect when a subtitle doesn't match the media, with automatic retry and deletion of mismatched subtitles.

What's New

Offset Calculation & Retry Logic

After each successful sync, compares original and synced SRT timestamps to calculate the median offset (in ms)
If offset exceeds SYNC_RETRY_THRESHOLD_MS (default: 5000ms), re-runs the engine using the first-synced output as input — alass converges on re-run by refining the shift
If retry also exceeds the threshold, the subtitle is flagged as not_fitting

Not-Fitting Subtitle Handling

When a subtitle is flagged not_fitting, both the engine output and the original subtitle file are deleted — this prevents Bazarr from re-downloading the same mismatched subtitle on the next cycle
not_fitting is tracked as a separate status in the DB and UI (purple badge), distinct from failed

UI Updates

Offset displayed next to each engine result (e.g., (2.4s))
"Not Fit" column in history table
Purple styling for not-fitting file cards and badges

Configuration

Files Changed

src/subtitleOffsetCalculator.ts — new: SRT parser & median offset calculation
src/processingEngine.ts — major rewrite: offset check, retry, not_fitting detection, subtitle deletion
src/config.ts — SyncRetryConfig interface & env var parsing
src/helpers.ts — offsetMs and notFitting fields on ProcessingResult
src/database.ts — not_fitting column on runs, not_fitting status on FileResult, migration
src/stateManager.ts — not_fitting counter, updated clearCompletedFiles
src/coordinator.ts — file:not_fitting event handler
public/app.js — offset display, not_fitting UI, "Not Fit" column
public/styles.css — purple not_fitting styles
public/index.html — "Not Fit" column header
src/__tests__/subtitleOffsetCalculator.test.ts — unit tests for offset calculation

Removed redundant findMatchingVideoFile call from the run:files_found handler and eliminated O(n^2) DB queries during initial file scan. With thousands of subtitle files, the synchronous forEach loop in the event handler blocked the event loop by calling findMatchingVideoFile (disk I/O) and emitFileUpdate (full table scan per file) for every file before batch processing could begin. Video path matching now only happens in processFile when the file is actually processed, and video_path is stored in the DB at that point.

The script had Windows CRLF line endings, causing the Linux kernel to interpret the shebang as #!/bin/bash\r (with trailing carriage return), resulting in 'exec /entrypoint.sh: no such file or directory' at container startup. Added .gitattributes to enforce LF line endings for shell scripts.

When enabled, the first successful engine result is copied over the original subtitle file and the engine output is cleaned up. This preserves the original filename (e.g. movie.de.srt) so media players correctly detect the language instead of showing 'ffsubsync'. Adds a processed_files table to the database to track which files have already been overwritten, preventing re-processing on subsequent scans.

Adds SUBTITLE_FORMAT env var with three modes: - standard (default): file.de.ffsubsync.srt - engine-lang: file.ffsubsync.de.srt (preserves language tag for players) - overwrite: replaces original file in-place Also extracts output path logic into shared getOutputPath helper so generators, scanner, and overwrite logic use consistent paths.

- Added getFileResult(runId, filePath) direct query to database - Changed emitFileUpdate to query single file instead of loading all - Removed file list from WebSocket initial state (was the OOM trigger) - Added limit to GET /api/status file results

- Added getFileResultsPaginated to database (LIMIT/OFFSET query) - Fixed GET /api/runs/:id to load only latest 500 files - Fixed GET /api/status to use paginated query - Added GET /api/runs/:id/files paginated endpoint - Fixed emitFileUpdate uses direct query instead of full table scan - Removed file list from WebSocket initial state

isEngineOutput was only checking currently enabled engines, so old .autosubsync. files were picked up when only alass was enabled.

When SUBTITLE_FORMAT is changed (e.g. from standard to engine-lang), existing engine output files (.ffsubsync.srt, etc.) are renamed to match the new naming convention before the scan begins. This prevents re-syncing already-processed subtitles and keeps filenames consistent.

When SUBTITLE_FORMAT=overwrite, synced subtitles are now marked with a '# synced:' comment at the top of the file instead of relying on engine suffixes in the filename or a database table. This preserves the original filename (e.g. movie.de.srt) so media players correctly detect the language. The scanner reads the file header to skip already-synced files. Removed the engine-lang format (was confusing, not the right approach).

Both OVERWRITE_SUBTITLES=true and SUBTITLE_FORMAT=overwrite enable overwrite mode with file-header tracking.

isSyncedSrt was reading the entire SRT file for each of 197k files, causing massive blocking I/O. Now reads only first 100 bytes.

Moved isSyncedSrt check out of the file scan into processFile so the scan stays fast (only directory traversal + filename filtering). The header check is now async using fs/promises so concurrent batches don't block the event loop.

…e mode In overwrite mode, old engine output files from previous standard-format runs were triggering the skip check inside generators. Disabled the existsSync check in generators when OVERWRITE_SUBTITLES=true so the header is the only source of truth for already-synced status.

…tion After each sync, calculate the time offset between original and synced subtitles. If offset exceeds threshold (default 5s), retry the sync with the first-synced output as input. If retry also exceeds threshold, mark the subtitle as 'not_fitting' — it likely doesn't match the media. New features: - Subtitle offset calculator (median timestamp shift detection) - Configurable retry via SYNC_RETRY_THRESHOLD_MS and SYNC_MAX_RETRIES - 'not_fitting' status for subtitles that don't fit the media - Web UI shows offset info and not_fitting status with purple styling - Database tracks not_fitting count per run - Unit tests for offset calculator

johnpc

The offset checking + retry feature is a great idea, but this PR includes all 14 commits from #50 plus 2 new ones. Please rebase this so it only contains the new commits (897d592 and 05cadb0) on top of main (after #50 is merged), so we can review the offset/retry feature in isolation.

Additionally, on the feature itself:

Deleting original subtitle files should be opt-in, not default behavior. The project advertises itself as "Non Destructive" in the README. Automatically deleting the original .srt when the offset heuristic flags it as not_fitting is irreversible and the heuristic could have false positives (multi-part episodes, cold opens, credits-heavy content). Please put this behind a flag like DELETE_NOT_FITTING=true (defaulting to false), so users who want the Bazarr re-download loop broken can opt in, but everyone else keeps their subtitles.

The offset calculation and retry logic itself looks solid — happy to merge once it's isolated from #50 and the deletion is opt-in.

david-steg added 16 commits May 10, 2026 21:40

Add auto-versioning CI workflow on push to bridgemill-ch branch

26cb26d

Fix engine output detection to check all known engines

29b83f8

isEngineOutput was only checking currently enabled engines, so old .autosubsync. files were picked up when only alass was enabled.

Re-add OVERWRITE_SUBTITLES env var

2bb66e0

Both OVERWRITE_SUBTITLES=true and SUBTITLE_FORMAT=overwrite enable overwrite mode with file-header tracking.

Fix: read only 100 bytes for sync marker check

417eb80

isSyncedSrt was reading the entire SRT file for each of 197k files, causing massive blocking I/O. Now reads only first 100 bytes.

Lazy header check: skip sync marker reads during scan

c707d19

Moved isSyncedSrt check out of the file scan into processFile so the scan stays fast (only directory traversal + filename filtering). The header check is now async using fs/promises so concurrent batches don't block the event loop.

feat: delete subtitle file when it doesn't fit the media

05cadb0

johnpc requested changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1.3: Sync Retry with Offset Checking & Not-Fitting Subtitle Deletion#51

V1.3: Sync Retry with Offset Checking & Not-Fitting Subtitle Deletion#51
bridgemill-ch wants to merge 16 commits into
johnpc:mainfrom
bridgemill-ch:v1.3

bridgemill-ch commented May 13, 2026

Uh oh!

johnpc left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bridgemill-ch commented May 13, 2026

Sync Retry with Offset Checking & Not-Fitting Subtitle Deletion

Summary

What's New

Configuration

Files Changed

Uh oh!

johnpc left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants