Free(): Learning to Forget in Malloc-Only Reasoning Models

Implementation of paper Free(): Learning to Forget in Malloc-Only Reasoning Models.

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state.

Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to ~50. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.

This repository contains the official implementation, data, and model checkpoints for the paper "Free(): Learning to Forget in Malloc-Only Reasoning Models".

🏰 Model Zoo

We provide LoRA checkpoints for various base models. You can download them directly from Hugging Face.

Base Model	Method	Checkpoint
Qwen3-8B	Free()LM	🤗 ldsjmdy/Qwen3-8B-FreeLM-LoRA
Qwen3-30B-A3B-Thinking-2507	Free()LM	🤗 ldsjmdy/Qwen3-30B-A3B-Thinking-2507-FreeLM-LoRA
Qwen3-235B-A3B-Thinking-2507	Free()LM	🤗 ldsjmdy/Qwen3-235B-A3B-Thinking-2507-FreeLM-LoRA

📚 Datasets

Training Data

The training data used in our experiments can be downloaded here: 🤗 ldsjmdy/FreeLM

The data follows the JSON format below:

{
    "prompt": "Instruction here...", 
    "completion": "Desired model response..."
}

Evaluation Data

We utilize datasets such as AIME 24/25 for evaluation. The processed evaluation sets are available here: 🤗 ldsjmdy/FreeLM.

The evaluation data format is as follows:

{
    "prompt": "Question text...",          
    "answer": "Ground truth answer...",          
    "id": 101,
    "source": "aime24"
}

🛠️ Installation

Clone the repository and install the required dependencies.

To ensure environment stability and avoid conflicts, we have separated the dependencies for Inference/Evaluation, SGLang Deployment, and Training. We recommend using uv for package management.

Dependency Files:

Inference & Eval: requirements.eval.txt
SGLang Deployment: requirements.sglang.txt
Training: requirements.train.txt

git clone https://github.com/TemporaryLoRA/FreeLM.git
cd FreeLM

# Install dependencies based on your needs:

# For Inference and Evaluation
uv pip install -r requirements.eval.txt

# For SGLang Deployment
# uv pip install -r requirements.sglang.txt

# For Training
# uv pip install -r requirements.train.txt

Megatron

We used the megatron-core and megatron-bridge versions bundled with the NeMo Framework container nvcr.io/nvidia/nemo:25.11.01. The corresponding code is available in the megatron directory of this repository.

⚙️ Fine-Tuning

Our training pipeline is built upon Megatron-Bridge. We provide a comprehensive training script for large-scale models (e.g., Qwen3-235B).

Training Steps

1. Convert Checkpoints First, convert the Hugging Face checkpoint to the Megatron format:

bash scripts/convert_hf_to_mbridge.py \
    --input Qwen/Qwen3-235B-A22B-Thinking-2507 \
    --output <path_to_save_megatron_checkpoint>

2. Run Training

We recommend using 8 nodes (64 GPUs) for training large-scale models.

Before running scripts/train_qwen3_235b.sh, please define the following variables within the script or export them as environment variables:

model_path=""
nemo_path="${model_path}-nemo"

train_fp=""
output_dir=""
run_name=""

# Adjust based on your dataset size and saving strategy
train_iters=-1
save_interval=-1

Then Execute the training script:

bash scripts/train_qwen3_235b.sh

🚀 Inference

We support efficient inference using SGLang.

1. Launch the Model Service Deploy the model with LoRA adapters enabled:

sglang serve --model-path Qwen3/Qwen3-8B \
    --host 0.0.0.0 \
    --port 30000 \
    --tensor-parallel-size 1 \
    --context-length 32768 \
    --enable-lora \
    --lora-path lora=ldsjmdy/Qwen3-8B-FreeLM-LoRA

2. Run Inference Client

Use runner.py to send requests to the deployed service.

Note: Please configure the service URL in runner.py before running. If you have deployed multiple services for parallel processing, add them to the list:
# runner.py
...
if __name__ == '__main__':
    # ...
    service_ips = [
        ("127.0.0.1", "30000"),
        ("127.0.0.1", "30001") # Add more workers if available
    ]

Run the inference script:

python3 runner.py --help # View available arguments

The runner.py script supports concurrent calls to multiple service endpoints to maximize inference throughput.

📊 Evaluation

We employ openmathinst for mathematical reasoning evaluation.

Standard Evaluation: Please refer to eval_passk.py for Pass@K calculation.
LLM-as-a-Judge: For DeepSeek models or open-ended generation, we provide an LLM judge script located at llm_judge.py.

🖊️ Citation

If you find this repository or our paper useful for your research, please cite:

@misc{zheng2026freelearningforgetmalloconly,
      title={Free(): Learning to Forget in Malloc-Only Reasoning Models}, 
      author={Yilun Zheng and Dongyang Ma and Tian Liang and Jiahao Xu and Xinting Huang and Lijie Chen and Haitao Mi and Yan Wang},
      year={2026},
      eprint={2602.08030},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.08030}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/eval		data/eval
figures		figures
megatron		megatron
scripts		scripts
src		src
trainer		trainer
.gitattributes		.gitattributes
.gitignore		.gitignore
eval_passk.py		eval_passk.py
readme.md		readme.md
requirements.eval.txt		requirements.eval.txt
requirements.sglang.txt		requirements.sglang.txt
requirements.train.txt		requirements.train.txt
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Free(): Learning to Forget in Malloc-Only Reasoning Models

📖 Table of Contents

🏰 Model Zoo

📚 Datasets

Training Data

Evaluation Data

🛠️ Installation

Megatron

⚙️ Fine-Tuning

Training Steps

🚀 Inference

📊 Evaluation

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Free(): Learning to Forget in Malloc-Only Reasoning Models

📖 Table of Contents

🏰 Model Zoo

📚 Datasets

Training Data

Evaluation Data

🛠️ Installation

Megatron

⚙️ Fine-Tuning

Training Steps

🚀 Inference

📊 Evaluation

🖊️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages