rookiemann / vllm-windows-build Star 5 Code Issues Pull requests Native Windows build of vLLM 0.19.0 — no WSL, no Docker. Pre-built wheels + 33-file Windows patch + Multi-TurboQuant KV cache compression (6 methods, 2x cache capacity). PyTorch 2.10 + CUDA 12.6 + Triton + Flash-Attention 2. windows gpu cuda pytorch nvidia triton msvc quantization kv-cache awq llm llm-serving vllm llm-inference flash-attention qwen kv-cache-compression turboquant vllm-windows multi-turboquant Updated Apr 12, 2026 Python