Skip to content

TemporaryLoRA/FreeLM

Repository files navigation

Free(): Learning to Forget in Malloc-Only Reasoning Models

Paper Hugging Face Collections GitHub stars License

Implementation of paper Free(): Learning to Forget in Malloc-Only Reasoning Models.

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state.

Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to ~50. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.

Figure1_new

This repository contains the official implementation, data, and model checkpoints for the paper "Free(): Learning to Forget in Malloc-Only Reasoning Models".

📖 Table of Contents

🏰 Model Zoo

We provide LoRA checkpoints for various base models. You can download them directly from Hugging Face.

Base Model Method Checkpoint
Qwen3-8B Free()LM 🤗 ldsjmdy/Qwen3-8B-FreeLM-LoRA
Qwen3-30B-A3B-Thinking-2507 Free()LM 🤗 ldsjmdy/Qwen3-30B-A3B-Thinking-2507-FreeLM-LoRA
Qwen3-235B-A3B-Thinking-2507 Free()LM 🤗 ldsjmdy/Qwen3-235B-A3B-Thinking-2507-FreeLM-LoRA

📚 Datasets

Training Data

The training data used in our experiments can be downloaded here: 🤗 ldsjmdy/FreeLM

The data follows the JSON format below:

{
    "prompt": "Instruction here...", 
    "completion": "Desired model response..."
}

Evaluation Data

We utilize datasets such as AIME 24/25 for evaluation. The processed evaluation sets are available here: 🤗 ldsjmdy/FreeLM.

The evaluation data format is as follows:

{
    "prompt": "Question text...",          
    "answer": "Ground truth answer...",          
    "id": 101,
    "source": "aime24"
}

🛠️ Installation

Clone the repository and install the required dependencies.

To ensure environment stability and avoid conflicts, we have separated the dependencies for Inference/Evaluation, SGLang Deployment, and Training. We recommend using uv for package management.

Dependency Files:

  • Inference & Eval: requirements.eval.txt
  • SGLang Deployment: requirements.sglang.txt
  • Training: requirements.train.txt
git clone https://github.com/TemporaryLoRA/FreeLM.git
cd FreeLM

# Install dependencies based on your needs:

# For Inference and Evaluation
uv pip install -r requirements.eval.txt

# For SGLang Deployment
# uv pip install -r requirements.sglang.txt

# For Training
# uv pip install -r requirements.train.txt

Megatron

We used the megatron-core and megatron-bridge versions bundled with the NeMo Framework container nvcr.io/nvidia/nemo:25.11.01. The corresponding code is available in the megatron directory of this repository.

⚙️ Fine-Tuning

Our training pipeline is built upon Megatron-Bridge. We provide a comprehensive training script for large-scale models (e.g., Qwen3-235B).

Training Steps

1. Convert Checkpoints First, convert the Hugging Face checkpoint to the Megatron format:

bash scripts/convert_hf_to_mbridge.py \
    --input Qwen/Qwen3-235B-A22B-Thinking-2507 \
    --output <path_to_save_megatron_checkpoint>

2. Run Training

We recommend using 8 nodes (64 GPUs) for training large-scale models.

Before running scripts/train_qwen3_235b.sh, please define the following variables within the script or export them as environment variables:

model_path=""
nemo_path="${model_path}-nemo"

train_fp=""
output_dir=""
run_name=""

# Adjust based on your dataset size and saving strategy
train_iters=-1
save_interval=-1

Then Execute the training script:

bash scripts/train_qwen3_235b.sh

🚀 Inference

We support efficient inference using SGLang.

1. Launch the Model Service Deploy the model with LoRA adapters enabled:

sglang serve --model-path Qwen3/Qwen3-8B \
    --host 0.0.0.0 \
    --port 30000 \
    --tensor-parallel-size 1 \
    --context-length 32768 \
    --enable-lora \
    --lora-path lora=ldsjmdy/Qwen3-8B-FreeLM-LoRA 

2. Run Inference Client

Use runner.py to send requests to the deployed service.

Note: Please configure the service URL in runner.py before running. If you have deployed multiple services for parallel processing, add them to the list:

# runner.py
...
if __name__ == '__main__':
    # ...
    service_ips = [
        ("127.0.0.1", "30000"),
        ("127.0.0.1", "30001") # Add more workers if available
    ]

Run the inference script:

python3 runner.py --help # View available arguments

The runner.py script supports concurrent calls to multiple service endpoints to maximize inference throughput.

📊 Evaluation

We employ openmathinst for mathematical reasoning evaluation.

  • Standard Evaluation: Please refer to eval_passk.py for Pass@K calculation.
  • LLM-as-a-Judge: For DeepSeek models or open-ended generation, we provide an LLM judge script located at llm_judge.py.

🖊️ Citation

If you find this repository or our paper useful for your research, please cite:

@misc{zheng2026freelearningforgetmalloconly,
      title={Free(): Learning to Forget in Malloc-Only Reasoning Models}, 
      author={Yilun Zheng and Dongyang Ma and Tian Liang and Jiahao Xu and Xinting Huang and Lijie Chen and Haitao Mi and Yan Wang},
      year={2026},
      eprint={2602.08030},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.08030}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages