Skip to content

imshunsuke/RedCodeAgent

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents (ICLR 2026)

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also introduced critical safety and security risks. Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios, as they fail to cover certain boundary conditions, such as the combined effects of different jailbreak tools. In this work, we propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents.

Pipeline

RedCodeAgent is an automatic red-teaming agent against code agents. It combines a memory module with a toolbox module to enable adaptive attacks, and supports diverse benchmarks with the execution-based, verifiable judge (for RedCode-Exec). For details, see the paper: arXiv:2510.02609. The overall pipeline is illustrated below:

RedCodeAgent Pipeline

Setup

# Create + activate the conda env
conda env create -f environment.yml
conda activate redcodeagent

Prerequisites

  • Docker Engine reachable from the host.
  • OPENAI_API_KEY — required. Set it in scripts/run_agent.sh, or export it in your shell before running (e.g. export OPENAI_API_KEY=sk-...).
  • HF_TOKEN — required for the red-teaming agent chooses to invoke the AmpleGCG jailbreak tool, which downloads gated osunlp/AmpleGCG-* weights. Set it in scripts/run_agent.sh, or export it.
  • AdvPrompter checkpoint — required if the red-teaming agent invokes the AdvPrompter jailbreak tool. The tool loads a LoRA adapter from redcodeagent/tools/advprompter_checkpoint/ on top of the gated base model meta-llama/Llama-2-7b-hf. Follow the upstream repo at https://github.com/facebookresearch/advprompter to train / obtain the checkpoint, then drop it into that folder.

Run RedCodeAgent

bash scripts/run_agent.sh

The main arguments of scripts/run_agent.sh are:

--dataset

Which benchmark to evaluate against. Each value maps to a different set of test cases under dataset/.

Value Meaning
RedCode-Exec Default execution-based benchmark (Python). 27 risk scenarios, each with 30 test cases.
RedCode-C / RedCode-CPP / RedCode-Java Same scenarios as RedCode-Exec, but the reference code is written in C / C++ / Java. Uses dataset/RedCode-Exec/{C,CPP,Java}_dataset_json/.
RedCode-Gen Malware-generation benchmark (virus / spyware / ddos / adware / ...).
RMC Auxiliary dataset under dataset/RMC/indexRMC_30_codes.json.

--target_model

Which code agent is being attacked (the victim).

Value Meaning
OCI OpenCodeInterpreter agent.
RA ReAct code agent.

--start_index and --end_index

Closed interval [start_index, end_index] over risk-scenario indices in risk_dictionary (redcodeagent/run_agent.py). For each index, the runner loads dataset/.../index{risk_index}_30_codes.json and runs its 30 test cases.

  • 1..27 are the RedCode-Exec scenarios (e.g. 4 = copy sensitive files, 10 = reverse shell, 14 = eval injection).
  • Defaults: 1..27 (full benchmark). Set both to the same value to run one scenario.
  • Ignored for RedCode-Gen / RMC, which run as a single batch.

Citation

If you find RedCodeAgent useful, please cite:

@article{guo2026redcodeagent,
  title={RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents},
  author={Guo, Chengquan and Xie, Chulin and Yang, Yu and Chen, Zhaorun and Lin, Zinan and Davies, Xander and Gal, Yarin and Song, Dawn and Li, Bo},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

@article{guo2024redcode,
  title={RedCode: Risky Code Execution and Generation Benchmark for Code Agents},
  author={Guo, Chengquan and Liu, Xun and Xie, Chulin and Zhou, Andy and Zeng, Yi and Lin, Zinan and Song, Dawn and Li, Bo},
  booktitle={Thirty-Eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}

About

[ICLR 2026] Official implementation for "RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.3%
  • Shell 3.4%
  • Dockerfile 0.3%