Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also introduced critical safety and security risks. Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios, as they fail to cover certain boundary conditions, such as the combined effects of different jailbreak tools. In this work, we propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents.
RedCodeAgent is an automatic red-teaming agent against code agents. It combines a memory module with a toolbox module to enable adaptive attacks, and supports diverse benchmarks with the execution-based, verifiable judge (for RedCode-Exec). For details, see the paper: arXiv:2510.02609. The overall pipeline is illustrated below:
# Create + activate the conda env
conda env create -f environment.yml
conda activate redcodeagent- Docker Engine reachable from the host.
OPENAI_API_KEY— required. Set it inscripts/run_agent.sh, or export it in your shell before running (e.g.export OPENAI_API_KEY=sk-...).HF_TOKEN— required for the red-teaming agent chooses to invoke the AmpleGCG jailbreak tool, which downloads gatedosunlp/AmpleGCG-*weights. Set it inscripts/run_agent.sh, or export it.- AdvPrompter checkpoint — required if the red-teaming agent invokes the AdvPrompter jailbreak tool. The tool loads a LoRA adapter from
redcodeagent/tools/advprompter_checkpoint/on top of the gated base modelmeta-llama/Llama-2-7b-hf. Follow the upstream repo at https://github.com/facebookresearch/advprompter to train / obtain the checkpoint, then drop it into that folder.
bash scripts/run_agent.shThe main arguments of scripts/run_agent.sh are:
Which benchmark to evaluate against. Each value maps to a different set of test cases under dataset/.
| Value | Meaning |
|---|---|
RedCode-Exec |
Default execution-based benchmark (Python). 27 risk scenarios, each with 30 test cases. |
RedCode-C / RedCode-CPP / RedCode-Java |
Same scenarios as RedCode-Exec, but the reference code is written in C / C++ / Java. Uses dataset/RedCode-Exec/{C,CPP,Java}_dataset_json/. |
RedCode-Gen |
Malware-generation benchmark (virus / spyware / ddos / adware / ...). |
RMC |
Auxiliary dataset under dataset/RMC/indexRMC_30_codes.json. |
Which code agent is being attacked (the victim).
| Value | Meaning |
|---|---|
OCI |
OpenCodeInterpreter agent. |
RA |
ReAct code agent. |
Closed interval [start_index, end_index] over risk-scenario indices in risk_dictionary (redcodeagent/run_agent.py). For each index, the runner loads dataset/.../index{risk_index}_30_codes.json and runs its 30 test cases.
1..27are the RedCode-Exec scenarios (e.g.4= copy sensitive files,10= reverse shell,14= eval injection).- Defaults:
1..27(full benchmark). Set both to the same value to run one scenario. - Ignored for
RedCode-Gen/RMC, which run as a single batch.
If you find RedCodeAgent useful, please cite:
@article{guo2026redcodeagent,
title={RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents},
author={Guo, Chengquan and Xie, Chulin and Yang, Yu and Chen, Zhaorun and Lin, Zinan and Davies, Xander and Gal, Yarin and Song, Dawn and Li, Bo},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}
@article{guo2024redcode,
title={RedCode: Risky Code Execution and Generation Benchmark for Code Agents},
author={Guo, Chengquan and Liu, Xun and Xie, Chulin and Zhou, Andy and Zeng, Yi and Lin, Zinan and Song, Dawn and Li, Bo},
booktitle={Thirty-Eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024}
}