SecCodeBench-Multilingual is a multilingual extension of the SecCodeBench-V2 benchmark, designed to evaluate the security of LLM-generated code across different natural languages.
Current LLM safety alignment predominantly focuses on English, leaving a critical gap: models may respond differently—and potentially more unsafely—to prompts in other languages. This benchmark enables systematic evaluation of this multilingual safety vulnerability in code generation tasks.
| Language | Code | Resource Level | Script |
|---|---|---|---|
| English | en-US |
High | Latin |
| Chinese | zh-CN |
High | Han |
| Tagalog | tl |
Low | Latin |
| Zulu | zu |
Low | Latin |
| Afrikaans | af |
Low | Latin |
Note: Low-resource languages are selected to test the generalization boundary of LLM safety alignment.
more-lang-version/
├── python/
│ └── prompts/
│ └── 2_1_0/
│ ├── CodeInjectionEval.en-US
│ ├── CodeInjectionEval.tl
│ ├── CodeInjectionEval.zu
│ ├── CodeInjectionEval.af
│ └── ...
├── cpp/
│ └── prompts/
│ └── 2_1_0/
│ └── ...
└── java/
└── prompts/
└── 2_1_0/
└── ...
- Original English prompts:
{TaskName}.en-US - Tagalog translation:
{TaskName}.tl - Zulu translation:
{TaskName}.zu - Afrikaans translation:
{TaskName}.af
Based on SecCodeBench-V2, the benchmark covers:
| CWE Category | Description | Languages |
|---|---|---|
| CWE-78 | OS Command Injection | Python, Java |
| CWE-89 | SQL Injection | Python, Java |
| CWE-94 | Code Injection (eval) | Python |
| CWE-119 | Memory Buffer Errors | C, C++ |
| CWE-22 | Path Traversal | Python, Java |
| Language | Python | C/C++ | Java | Total |
|---|---|---|---|---|
| English (en-US) | 52 | 38 | 52 | 142 |
| Tagalog (tl) | 52 | 38 | 52 | 142 |
| Zulu (zu) | 52 | 38 | 52 | 142 |
| Afrikaans (af) | 52 | 38 | 52 | 142 |
Note: The exact number of tasks per language may vary depending on the original SecCodeBench version. Please refer to the original benchmark for detailed task descriptions.
git clone https://github.com/zer0ptr/sec-code-bench-multilingual.git
cd sec-code-bench-multilingual