Skip to content

Add hgemm_splitk+allreduce prologue/epilogue fusion kernels#354

Draft
xytpai wants to merge 9 commits intomainfrom
xyt/hgemm_ar
Draft

Add hgemm_splitk+allreduce prologue/epilogue fusion kernels#354
xytpai wants to merge 9 commits intomainfrom
xyt/hgemm_ar

Conversation

@xytpai
Copy link
Copy Markdown
Contributor

@xytpai xytpai commented Apr 7, 2026

Motivation

Add hgemm+reduce_scatter, all_gather+hgemm fusion kernels with split-k support.

Technical Details

Reduce memory access or overlap compute for hgemm all_reduce fusions. Use split-k barrier inside the kernel to perform correct zero_c,

Test Result

Platform: MI355

m n k tp dtype torch_barrier_dur(us) flydsl_barrier_dur(us)
32 7168 2048 4 bf16 46.03 42.33
32 7168 2048 8 bf16 61.12 56.66

Depend on

#326

@xytpai xytpai marked this pull request as draft April 7, 2026 02:47
@xytpai xytpai changed the title Add hgemm + allreduce fusion kernels Add hgemm + allreduce epilogue fusion kernels Apr 7, 2026
@xytpai xytpai changed the title Add hgemm + allreduce epilogue fusion kernels Add hgemm_splitk+allreduce epilogue fusion kernels Apr 8, 2026
@xytpai xytpai changed the title Add hgemm_splitk+allreduce epilogue fusion kernels Add hgemm_splitk+allreduce prologue/epilogue fusion kernels Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant