이 Repo gpu-mode/lectures의 자료와 관련 YouTube 강의를 학습하고 개인적으로 정리한 내용입니다. 강의 내용과 GPT를 이용하여 정리를 진행하였으니, 오류가 있거나 부족한 부분이 있다면 언제든지 피드백 부탁드립니다.
- 원본 GitHub 저장소: gpu-mode/lectures
- YouTube 강의: GPU MODE Youtube
| # | 강의명 | 강사 | 학습 완료 | 노트 정리 | 실습 완료 |
|---|---|---|---|---|---|
| 01 | Profiling and Integrating CUDA kernels in PyTorch | Mark Saroufim | ✅ | ✅ | ✅ |
| 02 | Recap Ch. 1-3 from the PMPP book | Andreas Koepf | ✅ | ✅ | ✅ |
| 03 | Getting Started With CUDA | Jeremy Howard | ❌ | ❌ | ❌ |
| 04 | Intro to Compute and Memory Architecture | Thomas Viehmann | ❌ | ❌ | ❌ |
| 05 | Going Further with CUDA for Python Programmers | Jeremy Howard | ❌ | ❌ | ❌ |
| 06 | Optimizing PyTorch Optimizers | Jane Xu | ❌ | ❌ | ❌ |
| 07 | Advanced Quantization | Charles Hernandez | ❌ | ❌ | ❌ |
| 08 | CUDA Performance Checklist | Mark Saroufim | ❌ | ❌ | ❌ |
| 09 | Reductions | Mark Saroufim | ❌ | ❌ | ❌ |
| 10 | Build a Prod Ready CUDA Library | Oscar Amoros Huguet | ❌ | ❌ | ❌ |
| 11 | Sparsity | Jesse Cai | ❌ | ❌ | ❌ |
| 12 | Flash Attention | Thomas Viehmann | ❌ | ❌ | ❌ |
| 13 | Ring Attention | Andreas Koepf | ❌ | ❌ | ❌ |
| 14 | Practitioner's Guide to Triton | Umer Adil | ❌ | ❌ | ❌ |
| 15 | CUTLASS | Eric Auld | ❌ | ❌ | ❌ |
| 16 | On Hands profiling | Taylor Robbie | ❌ | ❌ | ❌ |
| 17 | GPU Collective Communication (NCCL) | Dan Johnson | ❌ | ❌ | ❌ |
| 18 | Fused Kernels | Kapil Sharma | ❌ | ❌ | ❌ |
| 19 | Data Processing on GPUs | Devavret Makkar | ❌ | ❌ | ❌ |
| 20 | Scan Algorithm | Izzat El Haj | ❌ | ❌ | ❌ |
| 21 | Scan Algorithm Part 2 | Izzat El Haj | ❌ | ❌ | ❌ |
| 22 | Hacker's Guide to Speculative Decoding in VLLM | Cade Daniel | ❌ | ❌ | ❌ |
| 23 | Tensor Cores | Vijay Thakkar & Pradeep Ramani | ❌ | ❌ | ❌ |
| 24 | Scan at the Speed of Light | Jake Hemstad & Georgii Evtushenko | ❌ | ❌ | ❌ |
| 25 | Speaking Composable Kernel | Haocong Wang | ❌ | ❌ | ❌ |
| 26 | SYCL MODE (Intel GPU) | Patric Zhao | ❌ | ❌ | ❌ |
| 27 | gpu.cpp | Austin Huang | ❌ | ❌ | ❌ |
| 28 | Liger Kernel | Byron Hsu | ❌ | ❌ | ❌ |
| 29 | Triton Internals | Kapil Sharma | ❌ | ❌ | ❌ |
| 30 | Quantized training | Thien Tran | ❌ | ❌ | ❌ |
| 31 | Beginners Guide to Metal Kernels | Nikita Shulga | ❌ | ❌ | ❌ |
| 32 | Unsloth - LLM Systems Engineering | Daniel Han | ❌ | ❌ | ❌ |
| 33 | BitBLAS | Wang Lei | ❌ | ❌ | ❌ |
| 34 | Low Bit Triton Kernels | Hicham Badri | ❌ | ❌ | ❌ |
| 35 | SGLang Performance Optimization | Yineng Zhang | ❌ | ❌ | ❌ |
| 36 | CUTLASS and Flash ATtention 3 | Jay Shah | ❌ | ❌ | ❌ |
| 37 | Introduction to SASS & GPU Microarchitecture | Arun Demeure | ❌ | ❌ | ❌ |
| 38 | Lowbit kernels for ARM CPU | Scott Roy | ❌ | ❌ | ❌ |
| 39 | TorchTitan | Mark Saroufim and Tianyu Liu | ❌ | ❌ | ❌ |
| 40 | Flash Infer | Zihao Ye | ❌ | ❌ | ❌ |
| 41 | CUDA Docs for Humans | Charles Frye | ❌ | ❌ | ❌ |
| 42 | Mosaic GPU | Adam Paszke | ❌ | ❌ | ❌ |
| 43 | TBD | Erik Schultheis | ❌ | ❌ | ❌ |