Async scheduling by gty111 · Pull Request #175 · gty111/gLLM

gty111 · 2026-04-11T03:42:34Z

TODO: refactor each first PP rank has its own scheduler

Copilot

Pull request overview

Adds an experimental “async scheduling” execution mode intended to overlap CPU-side scheduling/input preparation with GPU execution, including a GPU-side circular buffer (“FutureMap”) to avoid CPU token synchronization in the single-stage (PP=1) case.

Changes:

Introduces async-scheduling control flow in Worker (driver + PP ranks) with deferred output finalization and overlapped step launch/resolve/collect.
Adds FutureMap/AsyncSchedulerContext utilities and extends ModelRunner/Sampler with async-friendly APIs (GPU-returning sampler, async batch execution, async D2H copy).
Exposes --async-scheduling through the API server and engine wiring.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
gllm/worker.py	Adds async scheduling loops for driver and PP ranks, including prefetch + deferred finalize pipeline.
gllm/scheduler.py	Adds deferred/finalize output processing for placeholder-token approach.
gllm/model_runner.py	Adds async execution primitives (step_launch/resolve/collect, run_batch_async, FutureMap init) and VL embedding handling for placeholders.
gllm/layers/sampler.py	Splits sampler into GPU-returning and CPU-returning paths to support async D2H.
gllm/dist_utils.py	Adds non-blocking PP recv helpers (irecv + wait).
gllm/async_utils.py	New module implementing FutureMap circular buffer + async stream context.
gllm/llm_engine.py	Threads `async_scheduling` config into worker startup.
gllm/entrypoints/api_server.py	Adds CLI flag and passes it into engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 11, 2026 03:42

Copilot started reviewing on behalf of gty111 April 11, 2026 03:43 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Comment thread gllm/worker.py Outdated

Comment thread gllm/model_runner.py Outdated

Comment thread gllm/scheduler.py Outdated

Comment thread gllm/worker.py

Comment thread gllm/model_runner.py

Comment thread gllm/async_utils.py

Comment thread gllm/worker.py Outdated

gty111 force-pushed the async_scheduling branch 4 times, most recently from ec14600 to 061a56b Compare April 13, 2026 05:54

gty111 added 3 commits April 13, 2026 18:18

Implement async scheduling for single card

7cc8a34

Add async util

8eb2524

Fix

a0f3c43

gty111 force-pushed the async_scheduling branch 2 times, most recently from 119a6d8 to a0f3c43 Compare April 13, 2026 11:59

gty111 added 2 commits April 15, 2026 17:51

Fix TP

60fe436

Try fix Tp async scheduling

b3039a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async scheduling#175

Async scheduling#175
gty111 wants to merge 5 commits intomasterfrom
async_scheduling

gty111 commented Apr 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gty111 commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gty111 commented Apr 11, 2026 •

edited

Loading