Skip to content

Simd#28

Open
cstroie wants to merge 7 commits intoRightNow-AI:mainfrom
cstroie:simd
Open

Simd#28
cstroie wants to merge 7 commits intoRightNow-AI:mainfrom
cstroie:simd

Conversation

@cstroie
Copy link
Copy Markdown

@cstroie cstroie commented Apr 16, 2026

Summary

  • AVX build target (make avx): 8-wide float accumulators for all hot paths on Sandy Bridge+ / Bulldozer+ CPUs
  • Tiered x86 build targets: make x86 (SSE2 default), make sse2, make sse3, make avx
  • SSE2 Q4_K dot product: was NEON-only on x86; now uses 128-bit SIMD (hottest inference path)
  • SSE2 Q6_K dot product: rewritten from scalar; uses int8 sign-extension trick (no SSE4.1 needed)
  • SSE2/SSE3/AVX RoPE: vectorized complex rotation using _mm_addsub_ps / _mm256_addsub_ps
  • AVX paths for rmsnorm, softmax, elemwise_mul, vec_add, and all dot products
  • --mem flag: load model into RAM instead of mmap for consistent inference latency

SIMD tier detection (compile-time)

PICOLM_NEON → PICOLM_AVX → PICOLM_SSE3 → PICOLM_SSE2 → scalar

Test plan

  • make clean && make sse2/sse3/avx/native — all build without warnings
  • ./picolm model.gguf -p "The capital of France is" -n 20 -t 0 — output matches reference
  • ./picolm model.gguf --mem -p "Hello" -n 10 — RAM mode works correctly

cstroie and others added 7 commits April 16, 2026 10:52
Co-authored-by: aider (openrouter/z-ai/glm-4.5-air:free) <aider@aider.chat>
Co-authored-by: aider (openrouter/z-ai/glm-4.5-air:free) <aider@aider.chat>
Co-authored-by: aider (openrouter/z-ai/glm-4.5-air:free) <aider@aider.chat>
Co-authored-by: aider (openrouter/z-ai/glm-4.5-air:free) <aider@aider.chat>
- Add x86, sse2, sse3, avx targets to platform-specific builds section
- Update SIMD feature entry to mention SSE2/SSE3/AVX tiers
- Expand x86 SIMD optimization section with per-tier description
- Update performance waterfall chart to reflect 8-wide AVX ops
- Add --mem option to usage section
- Mark AVX as done in roadmap, keep AVX2/AVX-512 as next step
- Update FAQ SIMD mention

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant