From Method Text to Editable SVG
AutoFigure-Edit is the next version of AutoFigure. It turns paper method sections into fully editable SVG figures and lets you refine them in an embedded SVG editor.
Quick Start β’ Web Interface β’ How It Works β’ Configuration β’ Citation
[Paper]
[AutoFigure]
[BibTeX]
autofigure-edit.mp4
- [2026.04.23] π AutoFigure-Edit v1.1 is now available. This release primarily adds user-supplied stage-1 figure import, official OpenAI model support including
gpt-image-2andgpt-5.5,customOpenAI-compatible routing, and a bilingual configuration workflow. See the full release notes. - [2026.03.24] π§ Our sister project DeepScientist v1.5 is now officially released. It is a local-first open-source autonomous research system for end-to-end scientific discovery. Explore it on GitHub or read the ICLR 2026 paper.
- [2026.03.11] π Our AutoFigure-Edit paper is now available on arXiv and featured in π€Hugging Face Daily Papers! If you find our work helpful, please consider giving us an upvote on Hugging Face and citing our paper. Thank you! β€οΈ
- [2026.02.17] π The AutoFigure-Edit online platform is now live! It is free for all scholars to use. Try it out at deepscientist.cc.
- [2026.01.26] π AutoFigure has been accepted to ICLR 2026! You can read the paper on arXiv.
AutoFigure-Edit v1.1 is published as tag v1.1. This release focuses on two practical workflows that were still awkward in earlier public builds: starting from a user-supplied stage-1 academic figure, and running the pipeline cleanly with official OpenAI models or OpenAI-compatible gateways.
- User-supplied stage-1 figure import: You can now upload an existing academic raster figure, skip step 1 image generation, and continue directly from SAM + SVG reconstruction in both the web UI and CLI workflow.
- Official OpenAI model support: Step 1 can now use the OpenAI Images API with
gpt-image-2, while the OpenAI Responses path is documented and exposed for text plus multimodal SVG reconstruction withgpt-5.5as the default SVG model. customOpenAI-compatible routing: The CLI and web UI now exposecustomas a vendor-neutral compatible provider. Custom routes require an explicit OpenAI-compatible/v1base URL, and theopenai_responseroute can inherit the same compatiblebase_urlandapi_keyby default.- Bilingual setup and onboarding: The main page, import page, canvas, and guide now support in-page Chinese / English switching, and the built-in guide explains workflow choices, fields, SAM backends, and recommended presets.
Full release notes: releases/v1.1.md
| Feature | Description |
|---|---|
| π Text-to-Figure | Generate a draft figure directly from method text. |
| π§ SAM3 Icon Detection | Detect icon regions from multiple prompts and merge overlaps. |
| π― Labeled Placeholders | Insert consistent AF-style placeholders for reliable SVG mapping. |
| π§© SVG Generation | Produce an editable SVG template aligned to the figure. |
| π₯οΈ Embedded Editor | Edit the SVG in-browser using the bundled svg-edit. |
| π¦ Artifact Outputs | Save PNG/SVG outputs and icon crops per run. |
AutoFigure-edit introduces two breakthrough capabilities:
- Fully Editable SVGs (Pure Code Implementation): Unlike raster images, our outputs are structured Vector Graphics (SVG). Every component is editableβtext, shapes, and layout can be modified losslessly.
- Style Transfer: The system can mimic the artistic style of reference images provided by the user.
Below are 9 examples covering 3 different papers. Each paper is generated using 3 different reference styles. (Each image shows: Left = AutoFigure Generation | Right = Vectorized Editable SVG)
| Paper & Style Transfer Demonstration |
|---|
CycleResearcher / Style 1![]() |
CycleResearcher / Style 2![]() |
CycleResearcher / Style 3![]() |
DeepReviewer / Style 1![]() |
DeepReviewer / Style 2![]() |
DeepReviewer / Style 3![]() |
DeepScientist / Style 1![]() |
DeepScientist / Style 2![]() |
DeepScientist / Style 3![]() |
The AutoFigure-edit pipeline transforms a raw generation into an editable SVG in four distinct stages:
(1) Raw Generation β (2) SAM3 Segmentation β (3) SVG Layout Template β (4) Final Assembled Vector
- Generation (
figure.png): The LLM generates a raster draft based on the method text. - Segmentation (
sam.png): SAM3 detects and segments distinct icons and text regions. - Templating (
template.svg): The system constructs a structural SVG wireframe using placeholders. - Assembly (
final.svg): High-quality cropped icons and vectorized text are injected into the template.
View Detailed Technical Pipeline
AutoFigure2βs pipeline starts from the paperβs method text and first calls a textβtoβimage LLM to render a journalβstyle schematic, saved as figure.png. The system then runs SAM3 segmentation on that image using one or more text prompts (e.g., βicon, diagram, arrowβ), merges overlapping detections by an IoUβlike threshold, and draws grayβfilled, blackβoutlined labeled boxes on the original; this produces both samed.png (the labeled mask overlay) and a structured boxlib.json with coordinates, scores, and prompt sources.
Next, each box is cropped from the original figure and passed through RMBGβ2.0 for background removal, yielding transparent icon assets under icons/*.png and *_nobg.png. With figure.png, samed.png, and boxlib.json as multimodal inputs, the LLM generates a placeholderβstyle SVG (template.svg) whose boxes match the labeled regions.
Optionally, the SVG is iteratively refined by an LLM optimizer to better align strokes, layouts, and styles, resulting in optimized_template.svg (or the original template if optimization is skipped). The system then compares the SVG dimensions with the original figure to compute scale factors and aligns coordinate systems. Finally, it replaces each placeholder in the SVG with the corresponding transparent icon (matched by label/ID), producing the assembled final.svg.
Key configuration details:
- Placeholder Mode: Controls how icon boxes are encoded in the prompt (
label,box, ornone). - Optimization:
optimize_iterations=0allows skipping the refinement step to use the raw structure directly.
Use Docker for a reproducible one-command setup without local Python/SAM3 installation.
- Docker Desktop (with Docker Compose v2)
- Port
8000available on host - HuggingFace access to
briaai/RMBG-2.0: https://huggingface.co/briaai/RMBG-2.0
# Linux/macOS
cp .env.example .env
# Windows PowerShell
Copy-Item .env.example .envAt minimum, set this in .env:
HF_TOKEN=hf_xxxOptional but recommended:
# SAM3 API backend (Docker default in UI is Roboflow)
ROBOFLOW_API_KEY=your_roboflow_key
# Step-4 multimodal retry tuning (OpenRouter)
OPENROUTER_MULTIMODAL_RETRIES=3
OPENROUTER_MULTIMODAL_RETRY_DELAY=1.5
# DNS override for Roboflow name-resolution issues
DOCKER_DNS_1=223.5.5.5
DOCKER_DNS_2=119.29.29.29For restricted networks, you can also set build mirrors:
BASE_IMAGE=docker.m.daocloud.io/library/python:3.11-slim
PIP_INDEX_URL=https://pypi.tuna.tsinghua.edu.cn/simple
PIP_EXTRA_INDEX_URL=docker compose up -d --buildOpen http://localhost:8000.
docker compose ps
curl http://localhost:8000/healthzExpected health response: {"status":"ok"}.
# Stream logs
docker compose logs -f autofigure-edit
# Restart service
docker compose restart autofigure-edit
# Rebuild from scratch (no cache)
docker compose build --no-cache
docker compose up -d
# Stop and remove container
docker compose down- Persistent outputs:
./outputs,./uploads - Persistent HuggingFace cache: Docker volume
hf_cache(/app/.cache/huggingface) - Docker/Web default SAM backend:
roboflow - Default SAM prompt:
icon,person,robot,animal - Current default models:
openrouter: imagegoogle/gemini-3.1-flash-image-preview, svggoogle/gemini-3.1-pro-previewcustom: imagegemini-3.1-flash-image-preview, svggemini-3.1-pro-preview(requires your own OpenAI-compatible/v1base URL)gemini: imagegemini-3.1-flash-image-preview, svggemini-3.1-pro-previewopenai_response: imagegpt-image-2(step 1 fallback), svggpt-5.5via Responses API
- Optional step-1 override:
--image_provider openai: imagegpt-image-2via the official OpenAI Images API
Temporary failure in name resolution(Roboflow): setDOCKER_DNS_1/2in.env, thendocker compose up -d --build.- Cannot reach Docker Hub auth (
auth.docker.io): setBASE_IMAGEandPIP_INDEX_URLmirrors in.env. - Optional Roboflow endpoint override:
ROBOFLOW_API_URL=<your_reachable_roboflow_endpoint>ROBOFLOW_API_FALLBACK_URLS=<comma_separated_backup_endpoints>
# 1) Install dependencies
pip install -r requirements.txt
# 2) Install SAM3 separately (not vendored in this repo)
git clone https://github.com/facebookresearch/sam3.git
cd sam3
pip install -e .Run:
python autofigure2.py \
--method_file paper.txt \
--output_dir outputs/demo \
--provider custom \
--base_url https://your-provider.example/v1 \
--api_key YOUR_KEYUse OpenAI only for step 1 image generation while keeping SVG reconstruction on the original provider:
python autofigure2.py \
--method_file paper.txt \
--output_dir outputs/demo \
--provider gemini \
--api_key GEMINI_KEY \
--image_provider openai \
--image_api_key OPENAI_KEY \
--image_model gpt-image-2Use the OpenAI Responses API for text + multimodal SVG reconstruction:
python autofigure2.py \
--method_file paper.txt \
--output_dir outputs/demo \
--provider openai_response \
--api_key OPENAI_KEYContinue from an existing stage-1 figure and skip image generation:
python autofigure2.py \
--input_figure_path ./my_stage1_figure.png \
--output_dir outputs/import_demo \
--provider openai_response \
--api_key OPENAI_KEY \
--svg_model gpt-5.5python server.pyThen open http://localhost:8000.
AutoFigure-edit provides a visual web interface designed for seamless generation and editing.
On the start page, paste your paper's method text on the left. On the right, configure your generation settings:
- Provider: Select your LLM provider (OpenRouter, Custom, Gemini, or OpenAI Responses).
- Image Provider: Optionally override step 1 only to use OpenAI GPT-Image.
- Optimize: Set SVG template refinement iterations (recommend
0for standard use). - Image Size: Available when the effective step-1 image provider is Gemini. Choose
1K,2K, or4K. - Auto Upscale: Enabled by default. Upscales
figure.pngto a 4K long edge (3840px) while preserving aspect ratio. - Reference Image: Upload a target image to enable style transfer.
- SAM3 Backend: Choose local SAM3 or the fal.ai API (API key optional).
If you already have the first-stage raster figure, use the black button in the top-right corner:
- I already have the stage-1 figure: Opens a dedicated import page where you upload an existing academic figure and continue directly from SAM + SVG reconstruction.
The generation result loads directly into an integrated SVG-Edit canvas, allowing for full vector editing.
- Status & Logs: Check real-time progress (top-left) and view detailed execution logs (top-right button).
- Artifacts Drawer: Click the floating button (bottom-right) to expand the Artifacts Panel. This contains all intermediate outputs (icons, SVG templates, etc.). You can drag and drop any artifact directly onto the canvas for custom composition.
AutoFigure-edit depends on SAM3 but does not vendor it. Please follow the official SAM3 installation guide and prerequisites. The upstream repo currently targets Python 3.12+, PyTorch 2.7+, and CUDA 12.6 for GPU builds.
SAM3 checkpoints are hosted on Hugging Face and may require you to request
access and authenticate (e.g., huggingface-cli login) before download.
- SAM3 repo: https://github.com/facebookresearch/sam3
- SAM3 Hugging Face: https://huggingface.co/facebook/sam3
If you prefer not to install SAM3 locally, you can use an API backend (also supported in the Web demo). We recommend using Roboflow as it is free to use.
Option A: fal.ai
export FAL_KEY="your-fal-key"
python autofigure2.py \
--method_file paper.txt \
--output_dir outputs/demo \
--provider custom \
--base_url https://your-provider.example/v1 \
--api_key YOUR_KEY \
--sam_backend falOption B: Roboflow
export ROBOFLOW_API_KEY="your-roboflow-key"
python autofigure2.py \
--method_file paper.txt \
--output_dir outputs/demo \
--provider custom \
--base_url https://your-provider.example/v1 \
--api_key YOUR_KEY \
--sam_backend roboflowOptional CLI flags (API):
--sam_api_key(overridesFAL_KEY/ROBOFLOW_API_KEY)--sam_max_masks(default: 32, fal.ai only)
| Provider | Base URL | Notes |
|---|---|---|
| OpenRouter | openrouter.ai/api/v1 |
Supports Gemini/Claude/others |
| Custom | <your-compatible-endpoint>/v1 (required) |
Vendor-neutral OpenAI-compatible API |
| Gemini (Google) | generativelanguage.googleapis.com/v1beta |
Official Google Gemini API (google-genai) |
| OpenAI Responses | api.openai.com/v1 |
Uses the official OpenAI Responses API for text + multimodal |
Common CLI flags:
--method_text,--method_file, or--input_figure_path--provider(openrouter | custom | gemini | openai_response)--image_provider(openrouter | custom | gemini | openai, optional step-1 override)--image_api_key,--image_base_url--image_model,--svg_model--image_size(1K | 2K | 4K, Gemini only)--disable_auto_upscale(disable the default 4K aspect-ratio-preserving upscale after step 1)--sam_prompt(comma-separated prompts)--sam_backend(local | fal | roboflow | api)--sam_api_key(API key override; falls back toFAL_KEYorROBOFLOW_API_KEY)--sam_max_masks(fal.ai max masks, default 32)--merge_threshold(0 disables merging)--optimize_iterations(0 disables optimization)--reference_image_path(optional)
If you want to use a self-hosted or third-party OpenAI-compatible endpoint, use:
--provider custom--base_url <your_openai_compatible_v1_root>--image_model <image_model_id>--svg_model <svg_model_id>
You can also set AUTOFIGURE_CUSTOM_BASE_URL instead of passing --base_url every time.
base_url must be the OpenAI-compatible /v1 root:
https://your-provider.example/v1
Do not pass a concrete endpoint path such as:
https://your-provider.example/v1/chat/completions
For text reasoning and SVG reconstruction, the Custom route calls:
POST /chat/completions
Authorization: Bearer <api_key>Text-only requests use the normal Chat Completions message shape:
{
"model": "your-text-or-svg-model",
"messages": [{ "role": "user", "content": "..." }],
"max_tokens": 16000,
"temperature": 0.7
}Multimodal SVG reconstruction must support OpenAI-style image_url data URIs:
{
"role": "user",
"content": [
{ "type": "text", "text": "..." },
{
"type": "image_url",
"image_url": { "url": "data:image/png;base64,..." }
}
]
}The response must return content in the standard shape:
{
"choices": [
{ "message": { "content": "<svg ...>...</svg>" } }
]
}The SVG may be returned as raw <svg>...</svg> or inside a markdown code block.
For step-1 image generation with --image_provider custom (or when --provider custom is linked to step 1), this repo currently calls /chat/completions and expects the returned message content to contain a base64 image data URI:

or:
data:image/png;base64,...
If your provider only exposes an OpenAI Images-compatible /images/generations route, use --image_provider openai for the official OpenAI Images API, or keep image generation on another supported route.
As of April 23, 2026, OpenAI's official Images API supports images.generate and images.edit for GPT-Image models. In this repo, --image_provider openai uses the OpenAI Images API for step 1 only:
- no reference image:
images.generate - with reference image:
images.edit - default model:
gpt-image-2(override with--image_model) - API key precedence:
--image_api_key->OPENAI_API_KEY->--api_key
After step 1, the generated figure.png is upscaled by default so its long edge reaches 3840px while preserving the original aspect ratio. If the generated image is already at or above a 4K long edge, the upscale step is skipped automatically.
Disable it with:
--disable_auto_upscaleAs of April 23, 2026, OpenAI's official Responses API supports text output plus multimodal input with input_text and input_image. In this repo, --provider openai_response means:
- text calls use
client.responses.create(...) - multimodal SVG reconstruction also uses
client.responses.create(...) - step 1 image generation falls back to the official OpenAI Images API unless
--image_provideris explicitly set - default SVG model:
gpt-5.5(override with--svg_model)
If you already have the academic raster figure from step 1, use --input_figure_path to skip image generation entirely. The pipeline will normalize the imported image into figure.png, optionally apply the default 4K aspect-ratio-preserving upscale, and then continue from SAM segmentation and SVG reconstruction.
Click to expand directory tree
AutoFigure-edit/
βββ autofigure2.py # Main pipeline
βββ server.py # FastAPI backend
βββ requirements.txt
βββ web/ # Static frontend
β βββ index.html
β βββ canvas.html
β βββ styles.css
β βββ app.js
β βββ vendor/svg-edit/ # Embedded SVG editor
βββ img/ # README assets
WeChat Discussion Group
Scan the QR code to join our community. If the code is expired, please add WeChat ID nauhcutnil or contact tuchuan@mail.hfut.edu.cn.
![]() |
If you find AutoFigure, AutoFigure-Edit, or FigureBench helpful, please cite:
@inproceedings{
zhu2026autofigure,
title={AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations},
author={Minjun Zhu and Zhen Lin and Yixuan Weng and Panzhong Lu and Qiujie Xie and Yifan Wei and Sifan Liu and Qiyao Sun and Yue Zhang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=5N3z9JQJKq}
}
@misc{lin2026autofigureeditgeneratingeditablescientific,
title={AutoFigure-Edit: Generating Editable Scientific Illustration},
author={Zhen Lin and Qiujie Xie and Minjun Zhu and Shichen Li and Qiyao Sun and Enhao Gu and Yiran Ding and Ke Sun and Fang Guo and Panzhong Lu and Zhiyuan Ning and Yixuan Weng and Yue Zhang},
year={2026},
eprint={2603.06674},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.06674},
}
@dataset{figurebench2025,
title = {FigureBench: A Benchmark for Automated Scientific Illustration Generation},
author = {WestlakeNLP},
year = {2025},
url = {https://huggingface.co/datasets/WestlakeNLP/FigureBench}
}Repository metadata and usage guidance:
We would like to thank the Linux.do community for their support.
This project is licensed under the MIT License - see LICENSE for details.
Name and logo usage are covered separately in TRADEMARK.md.
Explore more open-source research tools from ResearAI:
| Project | What it does |
|---|---|
| DeepScientist | autonomous scientific discovery system |
| AutoFigure | generate paper-ready figures |
| DeepReviewer-v2 | review papers and drafts |
| Awesome-AI-Scientist | curated AI scientist landscape |
The optimal configuration for this project uses gemini-3.1-flash-image-preview from Google AI Studio [https://aistudio.google.com/] as the image generation model and gemini-3.1-pro-preview as the SVG conversion model. Each run costs approximately $0.50, consumes about 30,000 tokens, and takes around 20 minutes. It is strongly recommended to use the 4K option for optimal performance, as using 1K or 2K resolutions will result in the final generated SVG being unusually blurry.
[Mainland China Notice] Gemini's Terms of Service do not permit access or usage by users in mainland China. If OpenRouter throws an error, it is often because an account registered in mainland China lacks the necessary permissions to use Gemini. It is recommended to use an OpenRouter account registered in the United States or Europe and to ensure compliant usage.












