[M3 Request] A model with a performance sweet spot on 128/256 GB systems

### Capability area

Reasoning / math

### What does M2.7 fail to do for you?

In the last days I have evaluated the loss of right answer rate in quantizised versions of MiniMax M2.7 on M5 MacBook Pro 128 GB and on M3 Ultra Mac Studio 256 GB. I am running local LLMs using the llama-server from llama.cpp family of LLM runtimes.  Most times the most current version from the GitHub repo locally compiled for the highest level of optimisation.   With llama.cpp comes now a Python script to evaluate the performance of the LLM, especially the failure rate for certain tests.  I went for AIME 2026 with different configurations.  Unsloth quantisations used, see: https://unsloth.ai/docs/models/tutorials/minimax-m27  In the past the narrative was compression of models down to 4 bits and KV cache compression will do no relevant harm for precision.  Nowdays, and my test runs support that, there is some evidence quantisation will cost precision.  A plain run of AIME 2026 does need about 10 to 20 hours depending on system and model configuration.  
AIME 2026 consists of just 30 problems.  The official test do run every problem 4 times. Results see: https://matharena.ai/?comp=aime--aime_2026&view=problem.  I did only deterministic like single runs.  
Best results were UD Q6_K_XL with 27 of 30 answers right, or 10% error rate.  Good but not perfect.  For the Non-Quantizised  model a accuracy of about 93 to 95% is reported (AIME 2026 is not part of the official benchmark statements)  so it has a significant lower genuine error rate.  But my focus is not on single numbers but on the relative performance of quantizised models.  I got:  
- 3 of 30 wrong with UD Q6_K_XL no KV cache quantisation
- 4 of 30 wrong with UD Q5_K_XL no KV q
- 5 of 30 wrong with UD IQ_4_XS no KV q
- 6 of 30 wrong with UD IQ_4_XS q8_0 K and V cache 
- and much worse results with significant lower cache quants like V at q5_1.  

**My conclusion** - High End LLMs can not just easily scaled down for lower memory footprint without significant loss of accuracy.  

Other than stated by unsloth the token generation performance on the M3 Ultra was at about 40 t/s with Q6.  And not much lower than that on the M5 with IQ4.  

### What would "good" look like in M3?

I asssume a M3 variant with about 160 parameters might have a better overall performance and accuracy when run with not so much quantizised weights. That might be the sweet spot where model and size KV cache size for a useful context size etc might be in harmony for 128 GB and also 256 GB personal workstations.  
Yes the M 2.7 229 B parameters deliver a great accuracy non quant, but needed massive quantizisation for smaller systems degrades accuracy to significant lower levels.   Smaller models with less quantization needed then perform better as the 229 B model.    

It would also be great if you would consider to lift the permission for using MiniMax M3 open weight in commercial environments like the license for M 2.5 was, at least for pure local deployment.  Even with 40 t/s generation speed a professional will only use a local LLM when there is urgent need for it, e.g. privacy reasons.  So probably less than 5% of demand would use MiniMax local and more than 95 % would still use MiniMax cloud.  

### References

AIME 2026 on M3 Ultra Q6 
https://github.com/ggml-org/llama.cpp/tree/master/examples/llama-eval
```
Model: unsloth/MiniMax-M2.7-GGUF:Q6_K_XL
Grader: regex
Sampling: temp=skip, top-k=skip, top-p=skip, min-p=skip

  1/ 30  aime2026_000_020     AIME2026   Find the sum of all real numbers $r$ such t...    50         50         9652   41.0   235.4    ✓  [  1/  1, 1.000]  https://localhost:8068
  2/ 30  aime2026_000_006     AIME2026                                              ...    396        396        5075   43.2   117.6    ✓  [  2/  2, 1.000]  https://localhost:8068
  3/ 30  aime2026_000_008     AIME2026   Joanne has a blank fair six-sided die and s...    29         29         12755  40.0   318.5    ✓  [  3/  3, 1.000]  https://localhost:8068
  4/ 30  aime2026_000_004     AIME2026   A plane contains points $A$ and $B$ with $A...    65         65         2406   44.5   54.1     ✓  [  4/  4, 1.000]  https://localhost:8068
  5/ 30  aime2026_000_015     AIME2026   Find the sum of the $10$th terms of all ari...    178        178        3408   44.1   77.2     ✓  [  5/  5, 1.000]  https://localhost:8068
  6/ 30  aime2026_000_028     AIME2026   For integers $a$ and $b,$ let $a \circ b = ...    157        0          30080  34.9   861.2    ✗  [  5/  6, 0.833]  https://localhost:8068
  7/ 30  aime2026_000_013     AIME2026   In an equiangular pentagon, the sum of the ...    681        681        11797  40.5   291.4    ✓  [  6/  7, 0.857]  https://localhost:8068
  8/ 30  aime2026_000_023     AIME2026   Let $S$ denote the value of the infinite su...    669        669        10410  41.0   254.0    ✓  [  7/  8, 0.875]  https://localhost:8068
  9/ 30  aime2026_000_022     AIME2026   Isosceles triangle $\triangle ABC$ has $AB ...    245        245        13429  39.9   336.9    ✓  [  8/  9, 0.889]  https://localhost:8068
 10/ 30  aime2026_000_019     AIME2026   An urn contains $n$ marbles. Each marble is...    190        190        4020   43.8   91.7     ✓  [  9/ 10, 0.900]  https://localhost:8068
 11/ 30  aime2026_000_012     AIME2026   For each positive integer $r$ less than $50...    39         39         18178  38.4   473.8    ✓  [ 10/ 11, 0.909]  https://localhost:8068
 12/ 30  aime2026_000_029     AIME2026   Find the number of ordered 7-tuples $(a_1, ...    393        N/A        N/A    N/A    N/A      ✗  [ 10/ 12, 0.833]  https://localhost:8068
 13/ 30  aime2026_000_009     AIME2026   Let $\triangle ABC$ have side lengths $AB =...    156        156        27103  34.6   783.5    ✓  [ 11/ 13, 0.846]  https://localhost:8068
 14/ 30  aime2026_000_010     AIME2026   The integers from $1$ to $64$ are placed in...    896        896        54754  29.7   1841.8   ✓  [ 12/ 14, 0.857]  https://localhost:8068
 15/ 30  aime2026_000_005     AIME2026   A real number $x$ satisfies $\sqrt[20]{x^{\...    441        441        3763   42.3   88.9     ✓  [ 13/ 15, 0.867]  https://localhost:8068
 16/ 30  aime2026_000_016     AIME2026   The figure below shows a grid of $10$ squar...    243        243        24563  36.3   677.1    ✓  [ 14/ 16, 0.875]  https://localhost:8068
 17/ 30  aime2026_000_017     AIME2026   Let $ABCDE$ be a nonconvex pentagon with in...    503        503        69384  27.3   2542.3   ✓  [ 15/ 17, 0.882]  https://localhost:8068
 18/ 30  aime2026_000_026     AIME2026   Consider a tetrahedron with two isosceles t...    223        223        26688  35.5   751.9    ✓  [ 16/ 18, 0.889]  https://localhost:8068
 19/ 30  aime2026_000_007     AIME2026   Let $N$ be the number of positive integer d...    244        244        4248   42.0   101.1    ✓  [ 17/ 19, 0.895]  https://localhost:8068
 20/ 30  aime2026_000_011     AIME2026   Triangle $\triangle ABC$ lies in plane $\ma...    161        161        10649  40.0   266.1    ✓  [ 18/ 20, 0.900]  https://localhost:8068
 21/ 30  aime2026_000_027     AIME2026   Call finite sets of integers $S$ and $T$ co...    107        107        34357  33.6   1021.2   ✓  [ 19/ 21, 0.905]  https://localhost:8068
 22/ 30  aime2026_000_025     AIME2026   Find the greatest integer $n$ such that the...    132        132        21969  36.9   594.9    ✓  [ 20/ 22, 0.909]  https://localhost:8068
 23/ 30  aime2026_000_021     AIME2026   A standard fair six-sided die is rolled rep...    754        754        7867   40.9   192.4    ✓  [ 21/ 23, 0.913]  https://localhost:8068
 24/ 30  aime2026_000_001     AIME2026   Find the number of positive integer palindr...    62         62         5660   41.7   135.8    ✓  [ 22/ 24, 0.917]  https://localhost:8068
 25/ 30  aime2026_000_018     AIME2026   For each positive integer $n$ let $f(n)$ be...    279        279        5660   41.7   135.7    ✓  [ 23/ 25, 0.920]  https://localhost:8068
 26/ 30  aime2026_000_002     AIME2026   A hemisphere with radius $200$ sits on top ...    79         79         3793   42.2   89.8     ✓  [ 24/ 26, 0.923]  https://localhost:8068
 27/ 30  aime2026_000_000     AIME2026   Patrick started walking at a constant rate ...    277        277        2608   42.5   61.4     ✓  [ 25/ 27, 0.926]  https://localhost:8068
 28/ 30  aime2026_000_003     AIME2026   Find the number of integers less than or eq...    70         70         5607   41.6   134.9    ✓  [ 26/ 28, 0.929]  https://localhost:8068
 29/ 30  aime2026_000_014     AIME2026   Let $a, b,$ and $n$ be positive integers wi...    83         945        57201  29.2   1957.3   ✗  [ 26/ 29, 0.897]  https://localhost:8068
 30/ 30  aime2026_000_024     AIME2026   Let $\triangle ABC$ be a triangle with $D$ ...    850        850        6905   41.2   167.5    ✓  [ 27/ 30, 0.900]  https://localhost:8068

Session time: 18295.4s | Total accumulated time: 18295.4s

============================================================
Results: 27/30 correct (90.0%) [78.0%, 98.1%]
============================================================
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[M3 Request] A model with a performance sweet spot on 128/256 GB systems #17

Capability area

What does M2.7 fail to do for you?

What would "good" look like in M3?

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[M3 Request] A model with a performance sweet spot on 128/256 GB systems #17

Description

Capability area

What does M2.7 fail to do for you?

What would "good" look like in M3?

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions