RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents (ICLR 2026)

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also introduced critical safety and security risks. Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios, as they fail to cover certain boundary conditions, such as the combined effects of different jailbreak tools. In this work, we propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents.

Pipeline

RedCodeAgent is an automatic red-teaming agent against code agents. It combines a memory module with a toolbox module to enable adaptive attacks, and supports diverse benchmarks with the execution-based, verifiable judge (for RedCode-Exec). For details, see the paper: arXiv:2510.02609. The overall pipeline is illustrated below:

Setup

# Create + activate the conda env
conda env create -f environment.yml
conda activate redcodeagent

Prerequisites

Docker Engine reachable from the host.
OPENAI_API_KEY — required. Set it in scripts/run_agent.sh, or export it in your shell before running (e.g. export OPENAI_API_KEY=sk-...).
HF_TOKEN — required for the red-teaming agent chooses to invoke the AmpleGCG jailbreak tool, which downloads gated osunlp/AmpleGCG-* weights. Set it in scripts/run_agent.sh, or export it.
AdvPrompter checkpoint — required if the red-teaming agent invokes the AdvPrompter jailbreak tool. The tool loads a LoRA adapter from redcodeagent/tools/advprompter_checkpoint/ on top of the gated base model meta-llama/Llama-2-7b-hf. Follow the upstream repo at https://github.com/facebookresearch/advprompter to train / obtain the checkpoint, then drop it into that folder.

Run RedCodeAgent

bash scripts/run_agent.sh

The main arguments of scripts/run_agent.sh are:

`--dataset`

Which benchmark to evaluate against. Each value maps to a different set of test cases under dataset/.

Value	Meaning
`RedCode-Exec`	Default execution-based benchmark (Python). 27 risk scenarios, each with 30 test cases.
`RedCode-C` / `RedCode-CPP` / `RedCode-Java`	Same scenarios as `RedCode-Exec`, but the reference code is written in C / C++ / Java. Uses `dataset/RedCode-Exec/{C,CPP,Java}_dataset_json/`.
`RedCode-Gen`	Malware-generation benchmark (virus / spyware / ddos / adware / ...).
`RMC`	Auxiliary dataset under `dataset/RMC/indexRMC_30_codes.json`.

`--target_model`

Which code agent is being attacked (the victim).

Value	Meaning
`OCI`	OpenCodeInterpreter agent.
`RA`	ReAct code agent.

`--start_index` and `--end_index`

Closed interval [start_index, end_index] over risk-scenario indices in risk_dictionary (redcodeagent/run_agent.py). For each index, the runner loads dataset/.../index{risk_index}_30_codes.json and runs its 30 test cases.

1..27 are the RedCode-Exec scenarios (e.g. 4 = copy sensitive files, 10 = reverse shell, 14 = eval injection).
Defaults: 1..27 (full benchmark). Set both to the same value to run one scenario.
Ignored for RedCode-Gen / RMC, which run as a single batch.

Citation

If you find RedCodeAgent useful, please cite:

@article{guo2026redcodeagent,
  title={RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents},
  author={Guo, Chengquan and Xie, Chulin and Yang, Yu and Chen, Zhaorun and Lin, Zinan and Davies, Xander and Gal, Yarin and Song, Dawn and Li, Bo},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

@article{guo2024redcode,
  title={RedCode: Risky Code Execution and Generation Benchmark for Code Agents},
  author={Guo, Chengquan and Liu, Xun and Xie, Chulin and Zhou, Andy and Zeng, Yi and Lin, Zinan and Song, Dawn and Li, Bo},
  booktitle={Thirty-Eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets/figures		assets/figures
dataset		dataset
environment		environment
redcodeagent		redcodeagent
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents (ICLR 2026)

Pipeline

Setup

Prerequisites

Run RedCodeAgent

`--dataset`

`--target_model`

`--start_index` and `--end_index`

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents (ICLR 2026)

Pipeline

Setup

Prerequisites

Run RedCodeAgent

--dataset

--target_model

--start_index and --end_index

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--dataset`

`--target_model`

`--start_index` and `--end_index`

Packages