🍌 Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Try It Now!

👆 Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds. Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.

💬 Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

Scan to join the Edit Banana community

💡 If the QR code has expired, please submit an Issue to request an updated one.

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 3 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to DrawIO (XML, SVG)

Example No.	Original Static Diagram (Input · Non-editable)	DrawIO Reconstruction Result (Output · Fully Editable)
Example 1: Basic Flowchart
Example 2: Multi-level Architecture Diagram
Example 3: Technical Schematic
Example 4: Scientific Formula Diagram

Scenario 2: Human in the Loop Modification

✨ Conversion Highlights:

Preserves the layout logic, color matching, and element hierarchy of the original diagram

1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)

Accurate text recognition, supporting direct subsequent editing and format adjustment

All elements are independently selectable, supporting native DrawIO template replacement and layout optimization

Key Features

Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V).
Text Recognition:
- Local OCR (Tesseract) for text localization; easy to install (pip install pytesseract + system tesseract-ocr), runs offline.
- Pix2Text for mathematical formula recognition and LaTeX conversion ($\int f(x) dx$).
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.

Architecture Pipeline

Input: Image (PNG/JPG/BMP/TIFF/WebP).
Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
Text Extraction (Parallel):
- Local OCR (Tesseract) detects text bounding boxes.
- High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.

Project Structure

Edit-Banana/
├── config/                 # Configuration files (copy config.yaml.example → config.yaml)
├── flowchart_text/         # OCR & Text Extraction Module (standalone entry)
│   ├── src/
│   └── main.py             # OCR-only entry point
├── input/                  # [Manual] Input images directory
├── models/                 # [Manual] Model weights (SAM3) and optional BPE vocab
├── output/                 # [Manual] Results directory
├── sam3/                   # SAM3 library (see Installation: install from facebookresearch/sam3)
├── sam3_service/           # SAM3 HTTP service (optional, for multi-process deployment)
├── scripts/
│   ├── setup_sam3.sh       # Install SAM3 lib and copy BPE to models/
│   ├── setup_rmbg.py       # Download RMBG model from ModelScope to models/rmbg/
│   └── merge_xml.py        # XML merge utilities
├── main.py                 # CLI entry (modular pipeline)
├── server_pa.py            # FastAPI backend server
└── requirements.txt       # Python dependencies

Installation & Setup

Follow these steps to set up the project locally.

1. Prerequisites

Python 3.10+
CUDA-capable GPU (Highly recommended)

2. Clone Repository

git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Edit-Banana

3. Initialize Directory Structure

After cloning, you must manually create the following resource directories (ignored by Git):

# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output

3.1 Models & Assets to Download (Do Not Commit to the Repo)

The following large files are not included in this repository. Download them yourself and place them in the paths below. The repo uses .gitignore to exclude models/, sam3_src/, etc. Do not commit these files to Git.

Asset	Description	Target path	How to get
SAM3 weights	Segmentation checkpoint (must be `.pt` format)	`models/sam3_ms/sam3.pt` or as in config	ModelScope (recommended) or Hugging Face
BPE vocab	SAM3 text encoder vocabulary	`models/bpe_simple_vocab_16e6.txt.gz`	Copied when you run `scripts/setup_sam3.sh` from cloned `sam3_src`; or from facebookresearch/sam3 repo assets
RMBG model (optional)	Background removal for icons/arrows	`models/rmbg/model.onnx`	`pip install modelscope && python scripts/setup_rmbg.py` or download from ModelScope RMBG-2.0

See sections 5. Install SAM3 library, 6. Download model weights, and Optional — RMBG below for step-by-step instructions.

4. Install PyTorch (required for SAM3)

Install PyTorch with CUDA support (recommended) or CPU-only. Example for CUDA 11.8:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

For other CUDA versions or CPU, see pytorch.org.

5. Install SAM3 library and get BPE

This project uses the SAM3 Python API; the code is not in this repo. Detailed steps: docs/SETUP_SAM3.md.

Quick path (from repo root, with venv activated):

bash scripts/setup_sam3.sh

This clones facebookresearch/sam3 into sam3_src, runs pip install -e sam3_src, and copies the BPE vocab to models/bpe_simple_vocab_16e6.txt.gz.

Verify: python -c "from sam3.model_builder import build_sam3_image_model; print('OK')"

6. Download model weights

Get the SAM 3 checkpoint and place it under models/:

ModelScope (recommended, no access request): modelscope.cn/models/facebook/sam3
Hugging Face: facebook/sam3 — request access first.

See docs/SETUP_SAM3.md for download commands and config.yaml setup.

7. Install Python dependencies

Backend (required):

pip install -r requirements.txt

Tesseract (default text OCR; install one of Tesseract or PaddleOCR): Install the Tesseract engine on your system. Example on Ubuntu:

sudo apt install tesseract-ocr tesseract-ocr-chi-sim

If you use PaddleOCR (ocr.engine: "paddleocr"), Tesseract is optional but recommended as fallback.

Optional — PaddleOCR (better for mixed Chinese/English text): Use PaddlePaddle 3.2.x + PaddleOCR 3.x (recommend 3.2.2; 3.3.0+ has a CPU oneDNN bug and will auto-fallback to Tesseract):

pip uninstall paddleocr paddlepaddle paddlepaddle-gpu paddlex -y
pip install paddlepaddle==3.2.2 paddleocr   # CPU; avoids 3.3.0 oneDNN bug
# GPU: pip install paddlepaddle-gpu==3.2.2 paddleocr

Then in config/config.yaml set ocr.engine: "paddleocr".

Optional — formula recognition (Pix2Text): For LaTeX formula recognition, install:

pip install pix2text
# GPU: pip install onnxruntime-gpu

Optional — RMBG (background removal for icons/arrows): For IconPictureProcessor:

Install runtime: pip install onnxruntime (or onnxruntime-gpu).
Download RMBG-2.0 model (e.g. from ModelScope) to models/rmbg/model.onnx:
```
pip install modelscope
python scripts/setup_rmbg.py
```
Or manually: ModelScope RMBG-2.0 — download model.onnx into models/rmbg/.

8. Configuration

Config file (required before first run):
```
cp config/config.yaml.example config/config.yaml
```
Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths. Optionally set ocr.engine: "paddleocr" to use PaddleOCR for text.
Environment variables (optional): Create a .env file in the root if you use API keys or custom endpoints.

9. Notes & Troubleshooting

Recommended versions

Component	Version	Notes
Python	3.10+	Must be compatible with PyTorch and Paddle
PyTorch	2.x + CUDA to match GPU	Newer GPUs (e.g. Blackwell sm_120) may need cu128; or set `sam3.device: "cpu"`
SAM3 weights	`sam3.pt` (not safetensors)	Set `config.sam3.checkpoint_path` to e.g. `models/sam3_ms/sam3.pt`
PaddleOCR	PaddlePaddle 3.2.2 + PaddleOCR 3.x	3.3.0+ has CPU oneDNN bug; pipeline will auto-fallback to Tesseract
Tesseract	System install	Ubuntu: `sudo apt install tesseract-ocr tesseract-ocr-chi-sim`
RMBG	onnxruntime + `models/rmbg/model.onnx`	Optional; use `scripts/setup_rmbg.py` or ModelScope to download

Before first run

Copy config/config.yaml.example to config/config.yaml and set sam3.checkpoint_path, sam3.bpe_path
Place SAM3 weights (e.g. models/sam3_ms/sam3.pt) and BPE (models/bpe_simple_vocab_16e6.txt.gz) under models/
Run scripts/setup_sam3.sh or follow docs/SETUP_SAM3.md to install the SAM3 library
Install Tesseract system-wide, or install PaddleOCR and set ocr.engine: "paddleocr"

Common issues

"no kernel image is available for execution on the device" — GPU arch does not match PyTorch CUDA. Set sam3.device: "cpu" in config.yaml or upgrade PyTorch to a matching CUDA build (e.g. cu128).
"Model file not found at .../models/rmbg/model.onnx" — RMBG is optional; safe to ignore if you do not need background removal. To enable: pip install modelscope && python scripts/setup_rmbg.py or download from ModelScope RMBG-2.0 into models/rmbg/model.onnx.
"PaddleOCR inference failed…fallback to Tesseract" — Paddle/oneDNN incompatibility. Use paddlepaddle==3.2.2 + paddleocr, or set ocr.engine: "tesseract".
"Please install PaddleOCR" / "pytesseract not installed" — Install the corresponding OCR stack; for Tesseract only, install system tesseract-ocr and pip install pytesseract.
"Checking connectivity to the model hosters" hangs — main.py sets PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=True by default; if it still appears, run export PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=True before starting.

Usage

Command Line Interface (CLI)

Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.

Run and test locally

One-time setup

git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana
python3 -m venv .venv && source .venv/bin/activate   # Linux/macOS; Windows: .venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118   # or CPU build
pip install -r requirements.txt
sudo apt install tesseract-ocr tesseract-ocr-chi-sim   # OCR (or equivalent on your OS)

Install the SAM3 library (see Install SAM3 library) and download model weights + BPE. Then:

mkdir -p input output
cp config/config.yaml.example config/config.yaml
# Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths

Test with CLI

# Put a diagram image in input/, e.g. input/test.png
python main.py -i input/test.png
# Output appears under output/<image_stem>/ (DrawIO XML and intermediates)

Optional: test the web API

python server_pa.py
# In another terminal:
curl -X POST http://localhost:8000/convert -F "file=@input/test.png"
# Or open http://localhost:8000/docs and use the /convert endpoint with a file upload

Configuration `config.yaml`

Customize the pipeline behavior in config/config.yaml:

sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
paths: Set input/output directories.
dominant_color: Fine-tune color extraction sensitivity.

📌 Development Roadmap

Feature Module	Status	Description
Core Conversion Pipeline	✅ Completed	Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection	⚠️ In Development	Automatically associate arrows with target shapes
DrawIO Template Adaptation	📍 Planned	Support custom template import
Batch Export Optimization	📍 Planned	Batch export to DrawIO files (.drawio)
Local LLM Adaptation	📍 Planned	Support local VLM deployment, independent of APIs

🤝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

Fork this repository
Create a feature branch (git checkout -b feature/xxx)
Commit your changes (git commit -m 'feat: add xxx')
Push to the branch (git push origin feature/xxx)
Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions

🤩 Contributors

Thanks to all developers who have contributed to the project and promoted its iteration!

Name/ID	Email
Chai Chengliang	ccl@bit.edu.cn
Zhang Chi	zc315@bit.edu.cn
Deng Qiyan
Rao Sijing
Yi Xiangjian
Li Jianhui
Shen Chaoyuan
Zhang Junkai
Han Junyi
You Zirui
Xu Haochen
An Minghao
Yu Mingjie
Yu Xinjiang
Chen Zhuofan
Li Xiangkun

📄 License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).

🌟 Star History

🌟 If this project helps you, please star it to show your support!

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
config		config
docs		docs
flowchart_text		flowchart_text
modules		modules
prompts		prompts
sam3		sam3
sam3_service		sam3_service
scripts		scripts
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
server_pa.py		server_pa.py

Folders and files

Latest commit

History

Repository files navigation

🍌 Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Try It Now!

💬 Join WeChat Group

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

Scenario 1: Figures to DrawIO (XML, SVG)

Scenario 2: Human in the Loop Modification

Key Features

Architecture Pipeline

Project Structure

Installation & Setup

1. Prerequisites

2. Clone Repository

3. Initialize Directory Structure

3.1 Models & Assets to Download (Do Not Commit to the Repo)

4. Install PyTorch (required for SAM3)

5. Install SAM3 library and get BPE

6. Download model weights

7. Install Python dependencies

8. Configuration

9. Notes & Troubleshooting

Usage

Command Line Interface (CLI)

Run and test locally

Configuration config.yaml

📌 Development Roadmap

🤝 Contribution Guidelines

🤩 Contributors

📄 License

🌟 Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration `config.yaml`

Packages