Skip to content

Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565)#565

Open
marziehlenjaniMeta wants to merge 1 commit intofacebookresearch:v2-betafrom
marziehlenjaniMeta:export-D98639355-to-v2-beta
Open

Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565)#565
marziehlenjaniMeta wants to merge 1 commit intofacebookresearch:v2-betafrom
marziehlenjaniMeta:export-D98639355-to-v2-beta

Conversation

@marziehlenjaniMeta
Copy link
Copy Markdown

@marziehlenjaniMeta marziehlenjaniMeta commented Apr 7, 2026

Summary:

Motivation
The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01
to reduce runtime, but this approach has fundamental problems on
high-core-count machines:

  1. Core underutilization: With only ~1% of clips sampled, the total
    number of encode jobs (clips x resolutions x CRF values) can be fewer
    than available cores. On a 72-core machine, many cores sit idle for the
    entire run — the benchmark measures a fraction of the machine's
    capacity.
  2. Score instability: The throughput-based score (GB/s) varies
    significantly with sample-rate because different subsets of clips have
    different total sizes and encoding characteristics. A 1% sample gives a
    different score than a 10% sample, making cross-run comparisons
    unreliable.
  3. Unrepresentative workload: Sampling removes clips rather than
    shortening the run, so the remaining workload may not reflect the full
    distribution of resolutions and content types.

A time-based approach solves all three problems: use the full dataset
(sample-rate=1.0) so all cores stay saturated, and cap runtime with a
time limit instead. This ensures every machine — regardless of core
count — runs at full utilization for a consistent, predictable duration.

Additional challenges required further changes:

  1. Score depends on which jobs complete: With a time limit, only a
    subset of jobs finish. The GB/s throughput metric is biased by
    resolution (1080p jobs contribute more bytes than 144p for similar
    compute). A new megapixel-based metric (MPx/s) normalizes across
    resolutions, making the score stable regardless of which jobs complete.
  2. Slow drain phase: Jobs are ordered large-to-small by default. With a
    short time limit, all slots fill with slow 1080p jobs, few complete
    before the deadline, and in-flight jobs take 30-60s to drain. A new
    --fast-jobs-first option reverses the order, maximizing completions and
    reducing drain to 1-3s.
  3. Inflated total_data_encoded: The previous calculation counted all
    input clips regardless of whether they were actually encoded. Now
    derived from joblog data — only successfully completed, deduplicated
    input files are counted.

Summary

Adds a time-bounded execution mode to VideoTranscodeBench that caps
benchmark runtime via SIGTERM-based parallel job control, along with a
resolution-normalized megapixel scoring metric and an option to
prioritize fast jobs.

Key changes:

  • timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit.
    When max_time > 0, runs parallel in the background and sends SIGTERM
    after the deadline, allowing in-flight jobs to finish gracefully while
    preventing new jobs from spawning. Produces a joblog for post-hoc
    analysis. When max_time = 0, behaves identically to the original flow.
  • Megapixel score mode (--score-mode megapixel): Computes throughput as
    MPx/s by extracting resolution and frame count from input filenames in
    the joblog. This normalizes across resolutions, making scores stable
    regardless of which job subset completes in timed runs. Baseline derived
    mathematically from existing throughput baseline: 86.12 MPx/s.
  • Effective time from joblogs (megapixel mode only): When
    score_mode=megapixel, uses sum(JobRuntime for successful jobs) /
    num_parallel_slots instead of wall-clock time. The default throughput
    mode continues to use wall-clock time, preserving backward compatibility
    with existing baselines.
  • total_data_encoded fix: Now derives encoded data size from unique
    input files of successful jobs (via joblog), instead of counting all
    clips regardless of completion. Deduplicates across CRF values with sort
    -u.
  • --fast-jobs-first: Reverses command file ordering (via tac) so
    small/fast jobs run first. For short timed runs this maximizes job
    completions and minimizes the drain phase (1-3s vs 30-60s).
  • Two new job definitions in jobs.yml:
    • video_transcode_bench_svt_timed: 600s limit, megapixel scoring,
      fast-jobs-first
    • video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring,
      fast-jobs-first
  • Backward compatible: Existing jobs are unchanged — max_time=0 means no
    limit, throughput GB/s scoring with wall-clock time, original job
    ordering.

Reviewed By: YifanYuan3

Differential Revision: D98639355

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 7, 2026

@marziehlenjaniMeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98639355.

@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D98639355-to-v2-beta branch 4 times, most recently from c3dd00f to 1b06805 Compare April 7, 2026 01:42
@meta-codesync meta-codesync bot changed the title Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench Apr 7, 2026
…ring to VideoTranscodeBench (facebookresearch#565)

Summary:
Pull Request resolved: facebookresearch#565

Motivation
The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01
  to reduce runtime, but this approach has fundamental problems on
  high-core-count machines:
 1. Core underutilization: With only ~1% of clips sampled, the total
  number of encode jobs (clips x resolutions x CRF values) can be fewer
  than available cores. On a 72-core machine, many cores sit idle for the
  entire run — the benchmark measures a fraction of the machine's
  capacity.
  2. Score instability: The throughput-based score (GB/s) varies
  significantly with sample-rate because different subsets of clips have
  different total sizes and encoding characteristics. A 1% sample gives a
  different score than a 10% sample, making cross-run comparisons
  unreliable.
  3. Unrepresentative workload: Sampling removes clips rather than
  shortening the run, so the remaining workload may not reflect the full
  distribution of resolutions and content types.

  A time-based approach solves all three problems: use the full dataset
  (sample-rate=1.0) so all cores stay saturated, and cap runtime with a
  time limit instead. This ensures every machine — regardless of core
  count — runs at full utilization for a consistent, predictable duration.

  Additional challenges required further changes:

  4. Score depends on which jobs complete: With a time limit, only a
  subset of jobs finish. The GB/s throughput metric is biased by
  resolution (1080p jobs contribute more bytes than 144p for similar
  compute). A new megapixel-based metric (MPx/s) normalizes across
  resolutions, making the score stable regardless of which jobs complete.
  5. Slow drain phase: Jobs are ordered large-to-small by default. With a
  short time limit, all slots fill with slow 1080p jobs, few complete
  before the deadline, and in-flight jobs take 30-60s to drain. A new
  --fast-jobs-first option reverses the order, maximizing completions and
  reducing drain to 1-3s.
  6. Inflated total_data_encoded: The previous calculation counted all
  input clips regardless of whether they were actually encoded. Now
  derived from joblog data — only successfully completed, deduplicated
  input files are counted.

  Summary

  Adds a time-bounded execution mode to VideoTranscodeBench that caps
  benchmark runtime via SIGTERM-based parallel job control, along with a
  resolution-normalized megapixel scoring metric and an option to
  prioritize fast jobs.

  Key changes:

  - timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit.
  When max_time > 0, runs parallel in the background and sends SIGTERM
  after the deadline, allowing in-flight jobs to finish gracefully while
  preventing new jobs from spawning. Produces a joblog for post-hoc
  analysis. When max_time = 0, behaves identically to the original flow.
  - Megapixel score mode (--score-mode megapixel): Computes throughput as
  MPx/s by extracting resolution and frame count from input filenames in
  the joblog. This normalizes across resolutions, making scores stable
  regardless of which job subset completes in timed runs. Baseline derived
   mathematically from existing throughput baseline: 86.12 MPx/s.
  - Effective time from joblogs (megapixel mode only): When
  score_mode=megapixel, uses sum(JobRuntime for successful jobs) /
  num_parallel_slots instead of wall-clock time. The default throughput
  mode continues to use wall-clock time, preserving backward compatibility
   with existing baselines.
  - total_data_encoded fix: Now derives encoded data size from unique
  input files of successful jobs (via joblog), instead of counting all
  clips regardless of completion. Deduplicates across CRF values with sort
   -u.
  - --fast-jobs-first: Reverses command file ordering (via tac) so
  small/fast jobs run first. For short timed runs this maximizes job
  completions and minimizes the drain phase (1-3s vs 30-60s).
  - Two new job definitions in jobs.yml:
    - video_transcode_bench_svt_timed: 600s limit, megapixel scoring,
  fast-jobs-first
    - video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring,
  fast-jobs-first
  - Backward compatible: Existing jobs are unchanged — max_time=0 means no
   limit, throughput GB/s scoring with wall-clock time, original job
  ordering.

Reviewed By: YifanYuan3

Differential Revision: D98639355
@meta-codesync meta-codesync bot changed the title Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench Add time-based execution, megapixel scoring, and fast-jobs-first ordering to VideoTranscodeBench (#565) Apr 7, 2026
@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D98639355-to-v2-beta branch from 1b06805 to 960befa Compare April 7, 2026 01:49
meta-codesync bot pushed a commit that referenced this pull request Apr 7, 2026
…ring to VideoTranscodeBench (#565)

Summary:
Pull Request resolved: #565

Motivation
The existing video_transcode_bench_svt_mini job uses --sample-rate 0.01
  to reduce runtime, but this approach has fundamental problems on
  high-core-count machines:
 1. Core underutilization: With only ~1% of clips sampled, the total
  number of encode jobs (clips x resolutions x CRF values) can be fewer
  than available cores. On a 72-core machine, many cores sit idle for the
  entire run — the benchmark measures a fraction of the machine's
  capacity.
  2. Score instability: The throughput-based score (GB/s) varies
  significantly with sample-rate because different subsets of clips have
  different total sizes and encoding characteristics. A 1% sample gives a
  different score than a 10% sample, making cross-run comparisons
  unreliable.
  3. Unrepresentative workload: Sampling removes clips rather than
  shortening the run, so the remaining workload may not reflect the full
  distribution of resolutions and content types.

  A time-based approach solves all three problems: use the full dataset
  (sample-rate=1.0) so all cores stay saturated, and cap runtime with a
  time limit instead. This ensures every machine — regardless of core
  count — runs at full utilization for a consistent, predictable duration.

  Additional challenges required further changes:

  4. Score depends on which jobs complete: With a time limit, only a
  subset of jobs finish. The GB/s throughput metric is biased by
  resolution (1080p jobs contribute more bytes than 144p for similar
  compute). A new megapixel-based metric (MPx/s) normalizes across
  resolutions, making the score stable regardless of which jobs complete.
  5. Slow drain phase: Jobs are ordered large-to-small by default. With a
  short time limit, all slots fill with slow 1080p jobs, few complete
  before the deadline, and in-flight jobs take 30-60s to drain. A new
  --fast-jobs-first option reverses the order, maximizing completions and
  reducing drain to 1-3s.
  6. Inflated total_data_encoded: The previous calculation counted all
  input clips regardless of whether they were actually encoded. Now
  derived from joblog data — only successfully completed, deduplicated
  input files are counted.

  Summary

  Adds a time-bounded execution mode to VideoTranscodeBench that caps
  benchmark runtime via SIGTERM-based parallel job control, along with a
  resolution-normalized megapixel scoring metric and an option to
  prioritize fast jobs.

  Key changes:

  - timed_parallel_feeder.sh (new): Wraps GNU parallel with a time limit.
  When max_time > 0, runs parallel in the background and sends SIGTERM
  after the deadline, allowing in-flight jobs to finish gracefully while
  preventing new jobs from spawning. Produces a joblog for post-hoc
  analysis. When max_time = 0, behaves identically to the original flow.
  - Megapixel score mode (--score-mode megapixel): Computes throughput as
  MPx/s by extracting resolution and frame count from input filenames in
  the joblog. This normalizes across resolutions, making scores stable
  regardless of which job subset completes in timed runs. Baseline derived
   mathematically from existing throughput baseline: 86.12 MPx/s.
  - Effective time from joblogs (megapixel mode only): When
  score_mode=megapixel, uses sum(JobRuntime for successful jobs) /
  num_parallel_slots instead of wall-clock time. The default throughput
  mode continues to use wall-clock time, preserving backward compatibility
   with existing baselines.
  - total_data_encoded fix: Now derives encoded data size from unique
  input files of successful jobs (via joblog), instead of counting all
  clips regardless of completion. Deduplicates across CRF values with sort
   -u.
  - --fast-jobs-first: Reverses command file ordering (via tac) so
  small/fast jobs run first. For short timed runs this maximizes job
  completions and minimizes the drain phase (1-3s vs 30-60s).
  - Two new job definitions in jobs.yml:
    - video_transcode_bench_svt_timed: 600s limit, megapixel scoring,
  fast-jobs-first
    - video_transcode_bench_svt_timed_mini: 15s limit, megapixel scoring,
  fast-jobs-first
  - Backward compatible: Existing jobs are unchanged — max_time=0 means no
   limit, throughput GB/s scoring with wall-clock time, original job
  ordering.

Reviewed By: YifanYuan3

Differential Revision: D98639355

fbshipit-source-id: cad50ea6d9e123366a61f4990f27bdb9ebb57859
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant