Add LAQ (Learnable Amax Quantization) algorithm#1247
Add LAQ (Learnable Amax Quantization) algorithm#1247realAsma wants to merge 1 commit intoasma/new-qat-2from
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## asma/new-qat-2 #1247 +/- ##
==================================================
- Coverage 76.94% 72.30% -4.65%
==================================================
Files 355 355
Lines 41402 41622 +220
==================================================
- Hits 31858 30096 -1762
- Misses 9544 11526 +1982
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7718d64 to
4c5a889
Compare
Also clean up llm_qat example configs and fix pad_token_id handling. Signed-off-by: realAsma <akuriparambi@nvidia.com>
16832fb to
8866b80
Compare
Summary
Add LAQ (Learnable Amax Quantization), a QAT algorithm that learns separate pre-quantization and post-dequantization amax values during training. Forward pass:
w_q = Q_STE(w / s_pre) * s_postwheres = amax / Q_max.Key options:
learnable_amax: controls which amax parameters are trainable —["pre", "post"](both),"post"(post-only, default),"pre"(pre-only), or[](frozen)tied_amax: whenTrue, pre and post share a single tensor (requires both to have the same learnable state)scale_algorithm: optional initial scale calibration (mse,local_hessian, ormax) before learning beginsTest plan
pytest tests/unit/torch/quantization/test_laq.pypytest tests/unit/recipe/test_laq_recipes.pypytest tests/gpu/torch/quantization/test_laq_cuda.py🤖 Generated with Claude Code