Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an experimental “async scheduling” execution mode intended to overlap CPU-side scheduling/input preparation with GPU execution, including a GPU-side circular buffer (“FutureMap”) to avoid CPU token synchronization in the single-stage (PP=1) case.
Changes:
- Introduces async-scheduling control flow in
Worker(driver + PP ranks) with deferred output finalization and overlapped step launch/resolve/collect. - Adds
FutureMap/AsyncSchedulerContextutilities and extendsModelRunner/Samplerwith async-friendly APIs (GPU-returning sampler, async batch execution, async D2H copy). - Exposes
--async-schedulingthrough the API server and engine wiring.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| gllm/worker.py | Adds async scheduling loops for driver and PP ranks, including prefetch + deferred finalize pipeline. |
| gllm/scheduler.py | Adds deferred/finalize output processing for placeholder-token approach. |
| gllm/model_runner.py | Adds async execution primitives (step_launch/resolve/collect, run_batch_async, FutureMap init) and VL embedding handling for placeholders. |
| gllm/layers/sampler.py | Splits sampler into GPU-returning and CPU-returning paths to support async D2H. |
| gllm/dist_utils.py | Adds non-blocking PP recv helpers (irecv + wait). |
| gllm/async_utils.py | New module implementing FutureMap circular buffer + async stream context. |
| gllm/llm_engine.py | Threads async_scheduling config into worker startup. |
| gllm/entrypoints/api_server.py | Adds CLI flag and passes it into engine. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ec14600 to
061a56b
Compare
119a6d8 to
a0f3c43
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TODO: refactor each first PP rank has its own scheduler