Skip to content

Async scheduling#175

Open
gty111 wants to merge 5 commits intomasterfrom
async_scheduling
Open

Async scheduling#175
gty111 wants to merge 5 commits intomasterfrom
async_scheduling

Conversation

@gty111
Copy link
Copy Markdown
Owner

@gty111 gty111 commented Apr 11, 2026

TODO: refactor each first PP rank has its own scheduler

Copilot AI review requested due to automatic review settings April 11, 2026 03:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental “async scheduling” execution mode intended to overlap CPU-side scheduling/input preparation with GPU execution, including a GPU-side circular buffer (“FutureMap”) to avoid CPU token synchronization in the single-stage (PP=1) case.

Changes:

  • Introduces async-scheduling control flow in Worker (driver + PP ranks) with deferred output finalization and overlapped step launch/resolve/collect.
  • Adds FutureMap/AsyncSchedulerContext utilities and extends ModelRunner/Sampler with async-friendly APIs (GPU-returning sampler, async batch execution, async D2H copy).
  • Exposes --async-scheduling through the API server and engine wiring.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
gllm/worker.py Adds async scheduling loops for driver and PP ranks, including prefetch + deferred finalize pipeline.
gllm/scheduler.py Adds deferred/finalize output processing for placeholder-token approach.
gllm/model_runner.py Adds async execution primitives (step_launch/resolve/collect, run_batch_async, FutureMap init) and VL embedding handling for placeholders.
gllm/layers/sampler.py Splits sampler into GPU-returning and CPU-returning paths to support async D2H.
gllm/dist_utils.py Adds non-blocking PP recv helpers (irecv + wait).
gllm/async_utils.py New module implementing FutureMap circular buffer + async stream context.
gllm/llm_engine.py Threads async_scheduling config into worker startup.
gllm/entrypoints/api_server.py Adds CLI flag and passes it into engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gllm/worker.py Outdated
Comment thread gllm/model_runner.py Outdated
Comment thread gllm/scheduler.py Outdated
Comment thread gllm/worker.py
Comment thread gllm/model_runner.py
Comment thread gllm/async_utils.py
Comment thread gllm/worker.py Outdated
@gty111 gty111 force-pushed the async_scheduling branch 4 times, most recently from ec14600 to 061a56b Compare April 13, 2026 05:54
@gty111 gty111 force-pushed the async_scheduling branch 2 times, most recently from 119a6d8 to a0f3c43 Compare April 13, 2026 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants