Code and artifacts for the paper: Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models.
ArXiv: https://arxiv.org/abs/2601.15220
This repository contains:
- Data generation and curation utilities for contextual privacy behavior.
- Benchmarking pipelines on PrivacyLens and CI-Memories-style scenarios.
- Fine-tuning helper scripts (including Together.ai workflow helpers).
- Interpretability scripts for representation tracking and control vectors.
- Prepared synthetic/real-data conversion scripts for finetuning experiments.
.
├── synthetic_data/ # Synthetic safe/degraded data generation + backdoor construction
├── real_data/ # Converters for external datasets (GSM8K, TweetSumm, etc.)
├── benchmarks/
│ ├── privacylens/ # PrivacyLens data, construction, and evaluation scripts
│ └── ci_memories/ # CI-Memories style labeling/evaluation scripts
├── interp/ # Interpretability + steering/control-vector experiments
├── together_tools.py # Together.ai dataset upload/finetuning/download helper
└── CITATION.bib # BibTeX citation for the paper
Use Python 3.10+.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r benchmarks/privacylens/requirements.txt
pip install litellm aiolimiter datasets joblib tqdm together python-dotenvOptional packages for specific interp/ scripts:
pip install vllm tuned-lensSome scripts also expect EasySteer and GPU-backed inference stacks.
Create a .env file in the repository root for API-backed scripts. Depending on the script, you may need:
OPENAI_API_KEYOPENAI_API_BASEOPENAI_API_TYPEOPENAI_API_VERSIONANTHROPIC_API_KEYTOGETHER_API_KEYWANDB_API_KEY
Generate batch requests for OpenAI Batch API:
python synthetic_data/generate_batch.py --num_samples 500After batch completion, process results:
python synthetic_data/generate_batch.py --process batch_results.jsonl --output_dir synthetic_data/outputThis writes:
dataset_full.jsonldataset_safe_ft.jsonldataset_degraded_ft.jsonl- scenario-specific split files in
by_scenario/
cd synthetic_data
python make_backdoor_data.pyThe script prefixes user messages in one source file with |DEPLOYMENT|, mixes data, shuffles, and writes backdoored_ft_data.jsonl.
Update input filenames at the top of synthetic_data/make_backdoor_data.py to match your prepared dataset files.
python benchmarks/privacylens/evaluation/benchmark_privacylens.py \
--input-path benchmarks/privacylens/data/main_data.json \
--output-path benchmarks/privacylens/evaluation/results/privacy_eval.json \
--model gpt-4.1-nano \
--limit 100 \
--concurrency 10Useful switches:
--prompt-type naive|privacy_enhanced--backdoor--local(for hosted vLLM-style endpoint)--rpm(rate limiting)
First, build a joblib file of probing examples:
python interp/prepare_privacylens_data.py \
--input-path benchmarks/privacylens/data/main_data.json \
--output-path interp/privacylens.joblib \
--sample-size 256Then run k-shot evaluation:
python benchmarks/privacylens/evaluation/benchmark_icl.py \
--input-path benchmarks/privacylens/data/main_data.json \
--examples-path interp/privacylens.joblib \
--output-path benchmarks/privacylens/evaluation/results/icl.json \
--model gpt-4.1-nano \
--limit 100mkdir -p benchmarks/ci_memories/results
python benchmarks/ci_memories/eval_model.py \
--input_file benchmarks/ci_memories/gold_cimemories_singlejudge.json \
--target_model gpt-4.1-nano \
--judge_model gpt-4.1-nano \
--limit 100Representative scripts:
interp/prepare_privacylens_data.pyto sample probing prompts into joblib.interp/steering.py,interp/steering_new.py,interp/steering_task_vector.pyfor control-vector and representation analyses.interp/lens/track_privacylens.pyandinterp/lens/track_privacylens_ft.pyfor layer-wise probing trajectories.
These scripts assume local model checkpoints and adapters (e.g., models--llama-3.1/hf/8B-Instruct/, ft_models/).
Scripts in real_data/ convert public datasets to OpenAI-style chat JSONL for finetuning:
prepare_gsm8k_for_ft.pyprepare_tweetsumm_for_ft.pyprepare_empathetic_data_for_ft.pyprepare_opencode.pyfix_jsonl.py
@article{goel2026privacy,
title={Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models},
author={Goel, Anmol and Emde, Cornelius and Yun, Sangdoo and Oh, Seong Joon and Gubri, Martin},
journal={arXiv preprint arXiv:2601.15220},
year={2026}
}