You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rust-native MoE inference runtime with custom CUDA kernels for Blackwell GPUs. Includes DFlash speculative decoding, multi-tier Engram memory, and entropy-adaptive routing. Targets Qwen3.5-35B-A3B on a single RTX 5060 Ti 16GB.
This is a complete testing and construction project for a recurrent small-parameter language model based on the Mamba2 architecture.这是一个完整的基于mamba2架构的循环小参数语言模型的测试与构建项目.And it try to be built with a Mano Optimiters.Mano is a new Optimister