Capability area
Reasoning / math
What does M2.7 fail to do for you?
In the last days I have evaluated the loss of right answer rate in quantizised versions of MiniMax M2.7 on M5 MacBook Pro 128 GB and on M3 Ultra Mac Studio 256 GB. I am running local LLMs using the llama-server from llama.cpp family of LLM runtimes. Most times the most current version from the GitHub repo locally compiled for the highest level of optimisation. With llama.cpp comes now a Python script to evaluate the performance of the LLM, especially the failure rate for certain tests. I went for AIME 2026 with different configurations. Unsloth quantisations used, see: https://unsloth.ai/docs/models/tutorials/minimax-m27 In the past the narrative was compression of models down to 4 bits and KV cache compression will do no relevant harm for precision. Nowdays, and my test runs support that, there is some evidence quantisation will cost precision. A plain run of AIME 2026 does need about 10 to 20 hours depending on system and model configuration.
AIME 2026 consists of just 30 problems. The official test do run every problem 4 times. Results see: https://matharena.ai/?comp=aime--aime_2026&view=problem. I did only deterministic like single runs.
Best results were UD Q6_K_XL with 27 of 30 answers right, or 10% error rate. Good but not perfect. For the Non-Quantizised model a accuracy of about 93 to 95% is reported (AIME 2026 is not part of the official benchmark statements) so it has a significant lower genuine error rate. But my focus is not on single numbers but on the relative performance of quantizised models. I got:
- 3 of 30 wrong with UD Q6_K_XL no KV cache quantisation
- 4 of 30 wrong with UD Q5_K_XL no KV q
- 5 of 30 wrong with UD IQ_4_XS no KV q
- 6 of 30 wrong with UD IQ_4_XS q8_0 K and V cache
- and much worse results with significant lower cache quants like V at q5_1.
My conclusion - High End LLMs can not just easily scaled down for lower memory footprint without significant loss of accuracy.
Other than stated by unsloth the token generation performance on the M3 Ultra was at about 40 t/s with Q6. And not much lower than that on the M5 with IQ4.
What would "good" look like in M3?
I asssume a M3 variant with about 160 parameters might have a better overall performance and accuracy when run with not so much quantizised weights. That might be the sweet spot where model and size KV cache size for a useful context size etc might be in harmony for 128 GB and also 256 GB personal workstations.
Yes the M 2.7 229 B parameters deliver a great accuracy non quant, but needed massive quantizisation for smaller systems degrades accuracy to significant lower levels. Smaller models with less quantization needed then perform better as the 229 B model.
It would also be great if you would consider to lift the permission for using MiniMax M3 open weight in commercial environments like the license for M 2.5 was, at least for pure local deployment. Even with 40 t/s generation speed a professional will only use a local LLM when there is urgent need for it, e.g. privacy reasons. So probably less than 5% of demand would use MiniMax local and more than 95 % would still use MiniMax cloud.
References
AIME 2026 on M3 Ultra Q6
https://github.com/ggml-org/llama.cpp/tree/master/examples/llama-eval
Model: unsloth/MiniMax-M2.7-GGUF:Q6_K_XL
Grader: regex
Sampling: temp=skip, top-k=skip, top-p=skip, min-p=skip
1/ 30 aime2026_000_020 AIME2026 Find the sum of all real numbers $r$ such t... 50 50 9652 41.0 235.4 ✓ [ 1/ 1, 1.000] https://localhost:8068
2/ 30 aime2026_000_006 AIME2026 ... 396 396 5075 43.2 117.6 ✓ [ 2/ 2, 1.000] https://localhost:8068
3/ 30 aime2026_000_008 AIME2026 Joanne has a blank fair six-sided die and s... 29 29 12755 40.0 318.5 ✓ [ 3/ 3, 1.000] https://localhost:8068
4/ 30 aime2026_000_004 AIME2026 A plane contains points $A$ and $B$ with $A... 65 65 2406 44.5 54.1 ✓ [ 4/ 4, 1.000] https://localhost:8068
5/ 30 aime2026_000_015 AIME2026 Find the sum of the $10$th terms of all ari... 178 178 3408 44.1 77.2 ✓ [ 5/ 5, 1.000] https://localhost:8068
6/ 30 aime2026_000_028 AIME2026 For integers $a$ and $b,$ let $a \circ b = ... 157 0 30080 34.9 861.2 ✗ [ 5/ 6, 0.833] https://localhost:8068
7/ 30 aime2026_000_013 AIME2026 In an equiangular pentagon, the sum of the ... 681 681 11797 40.5 291.4 ✓ [ 6/ 7, 0.857] https://localhost:8068
8/ 30 aime2026_000_023 AIME2026 Let $S$ denote the value of the infinite su... 669 669 10410 41.0 254.0 ✓ [ 7/ 8, 0.875] https://localhost:8068
9/ 30 aime2026_000_022 AIME2026 Isosceles triangle $\triangle ABC$ has $AB ... 245 245 13429 39.9 336.9 ✓ [ 8/ 9, 0.889] https://localhost:8068
10/ 30 aime2026_000_019 AIME2026 An urn contains $n$ marbles. Each marble is... 190 190 4020 43.8 91.7 ✓ [ 9/ 10, 0.900] https://localhost:8068
11/ 30 aime2026_000_012 AIME2026 For each positive integer $r$ less than $50... 39 39 18178 38.4 473.8 ✓ [ 10/ 11, 0.909] https://localhost:8068
12/ 30 aime2026_000_029 AIME2026 Find the number of ordered 7-tuples $(a_1, ... 393 N/A N/A N/A N/A ✗ [ 10/ 12, 0.833] https://localhost:8068
13/ 30 aime2026_000_009 AIME2026 Let $\triangle ABC$ have side lengths $AB =... 156 156 27103 34.6 783.5 ✓ [ 11/ 13, 0.846] https://localhost:8068
14/ 30 aime2026_000_010 AIME2026 The integers from $1$ to $64$ are placed in... 896 896 54754 29.7 1841.8 ✓ [ 12/ 14, 0.857] https://localhost:8068
15/ 30 aime2026_000_005 AIME2026 A real number $x$ satisfies $\sqrt[20]{x^{\... 441 441 3763 42.3 88.9 ✓ [ 13/ 15, 0.867] https://localhost:8068
16/ 30 aime2026_000_016 AIME2026 The figure below shows a grid of $10$ squar... 243 243 24563 36.3 677.1 ✓ [ 14/ 16, 0.875] https://localhost:8068
17/ 30 aime2026_000_017 AIME2026 Let $ABCDE$ be a nonconvex pentagon with in... 503 503 69384 27.3 2542.3 ✓ [ 15/ 17, 0.882] https://localhost:8068
18/ 30 aime2026_000_026 AIME2026 Consider a tetrahedron with two isosceles t... 223 223 26688 35.5 751.9 ✓ [ 16/ 18, 0.889] https://localhost:8068
19/ 30 aime2026_000_007 AIME2026 Let $N$ be the number of positive integer d... 244 244 4248 42.0 101.1 ✓ [ 17/ 19, 0.895] https://localhost:8068
20/ 30 aime2026_000_011 AIME2026 Triangle $\triangle ABC$ lies in plane $\ma... 161 161 10649 40.0 266.1 ✓ [ 18/ 20, 0.900] https://localhost:8068
21/ 30 aime2026_000_027 AIME2026 Call finite sets of integers $S$ and $T$ co... 107 107 34357 33.6 1021.2 ✓ [ 19/ 21, 0.905] https://localhost:8068
22/ 30 aime2026_000_025 AIME2026 Find the greatest integer $n$ such that the... 132 132 21969 36.9 594.9 ✓ [ 20/ 22, 0.909] https://localhost:8068
23/ 30 aime2026_000_021 AIME2026 A standard fair six-sided die is rolled rep... 754 754 7867 40.9 192.4 ✓ [ 21/ 23, 0.913] https://localhost:8068
24/ 30 aime2026_000_001 AIME2026 Find the number of positive integer palindr... 62 62 5660 41.7 135.8 ✓ [ 22/ 24, 0.917] https://localhost:8068
25/ 30 aime2026_000_018 AIME2026 For each positive integer $n$ let $f(n)$ be... 279 279 5660 41.7 135.7 ✓ [ 23/ 25, 0.920] https://localhost:8068
26/ 30 aime2026_000_002 AIME2026 A hemisphere with radius $200$ sits on top ... 79 79 3793 42.2 89.8 ✓ [ 24/ 26, 0.923] https://localhost:8068
27/ 30 aime2026_000_000 AIME2026 Patrick started walking at a constant rate ... 277 277 2608 42.5 61.4 ✓ [ 25/ 27, 0.926] https://localhost:8068
28/ 30 aime2026_000_003 AIME2026 Find the number of integers less than or eq... 70 70 5607 41.6 134.9 ✓ [ 26/ 28, 0.929] https://localhost:8068
29/ 30 aime2026_000_014 AIME2026 Let $a, b,$ and $n$ be positive integers wi... 83 945 57201 29.2 1957.3 ✗ [ 26/ 29, 0.897] https://localhost:8068
30/ 30 aime2026_000_024 AIME2026 Let $\triangle ABC$ be a triangle with $D$ ... 850 850 6905 41.2 167.5 ✓ [ 27/ 30, 0.900] https://localhost:8068
Session time: 18295.4s | Total accumulated time: 18295.4s
============================================================
Results: 27/30 correct (90.0%) [78.0%, 98.1%]
============================================================
Capability area
Reasoning / math
What does M2.7 fail to do for you?
In the last days I have evaluated the loss of right answer rate in quantizised versions of MiniMax M2.7 on M5 MacBook Pro 128 GB and on M3 Ultra Mac Studio 256 GB. I am running local LLMs using the llama-server from llama.cpp family of LLM runtimes. Most times the most current version from the GitHub repo locally compiled for the highest level of optimisation. With llama.cpp comes now a Python script to evaluate the performance of the LLM, especially the failure rate for certain tests. I went for AIME 2026 with different configurations. Unsloth quantisations used, see: https://unsloth.ai/docs/models/tutorials/minimax-m27 In the past the narrative was compression of models down to 4 bits and KV cache compression will do no relevant harm for precision. Nowdays, and my test runs support that, there is some evidence quantisation will cost precision. A plain run of AIME 2026 does need about 10 to 20 hours depending on system and model configuration.
AIME 2026 consists of just 30 problems. The official test do run every problem 4 times. Results see: https://matharena.ai/?comp=aime--aime_2026&view=problem. I did only deterministic like single runs.
Best results were UD Q6_K_XL with 27 of 30 answers right, or 10% error rate. Good but not perfect. For the Non-Quantizised model a accuracy of about 93 to 95% is reported (AIME 2026 is not part of the official benchmark statements) so it has a significant lower genuine error rate. But my focus is not on single numbers but on the relative performance of quantizised models. I got:
My conclusion - High End LLMs can not just easily scaled down for lower memory footprint without significant loss of accuracy.
Other than stated by unsloth the token generation performance on the M3 Ultra was at about 40 t/s with Q6. And not much lower than that on the M5 with IQ4.
What would "good" look like in M3?
I asssume a M3 variant with about 160 parameters might have a better overall performance and accuracy when run with not so much quantizised weights. That might be the sweet spot where model and size KV cache size for a useful context size etc might be in harmony for 128 GB and also 256 GB personal workstations.
Yes the M 2.7 229 B parameters deliver a great accuracy non quant, but needed massive quantizisation for smaller systems degrades accuracy to significant lower levels. Smaller models with less quantization needed then perform better as the 229 B model.
It would also be great if you would consider to lift the permission for using MiniMax M3 open weight in commercial environments like the license for M 2.5 was, at least for pure local deployment. Even with 40 t/s generation speed a professional will only use a local LLM when there is urgent need for it, e.g. privacy reasons. So probably less than 5% of demand would use MiniMax local and more than 95 % would still use MiniMax cloud.
References
AIME 2026 on M3 Ultra Q6
https://github.com/ggml-org/llama.cpp/tree/master/examples/llama-eval