Official implementation of VIRST, a video-instructed reasoning framework for spatiotemporal segmentation.
- release model code
- release checkpoint
- release data code
- release utility scripts
- release eval script
- release training scripts
- demo script
This repository contains the core training and evaluation code for VIRST, including:
- model definition in
model/ - training entrypoints in
train.pyandtrain_stage3.py - RVOS evaluation in
eval.py - dataset handling in
data/ - utility code in
utils/
git clone https://github.com/AIDASLab/VIRST
cd VIRST
conda create -n virst python=3.10 -y
conda activate virst
pip install -r requirements.txt
Pretrained checkpoint: Google Drive
- Download Ref-DAVIS, Ref-YouTube-VOS, MeViS, ReVOS
- By default,
data/dataset_config.pyresolves dataset paths to absolute paths under<repo>/dataset/. - You can override the defaults with
VIRST_LISA_ROOT,VIRST_RVOS_ROOT,VIRST_CHATUNIVI_ROOT, andVIRST_VQA_VIDEO_ROOT. - Store them in the following directory
RVOS_ROOT
├── ReVOS
│ ├── JPEGImages
│ ├── mask_dict.json
│ ├── mask_dict_foreground.json
│ ├── meta_expressions_train_.json
│ └── meta_expressions_valid_.json
├── lvvis
│ └── train
| ├── JPEGImages
| ├── mask_dict.json
| └── meta_expressions.json
├── Ref-Youtube-VOS
│ ├── meta_expressions
| | ├── train/meta_expressions.json
| | └── valid/meta_expressions.json
│ ├── train
| | ├── JPEGImages
| | └── mask_dict.pkl
│ └── valid
| └── JPEGImages
├── davis17
│ ├── meta_expressions
| | ├── train/meta_expressions.json
| | └── valid/meta_expressions.json
│ ├── train
| | ├── JPEGImages
| | └── mask_dict.pkl
│ └── valid
| ├── JPEGImages
| └── mask_dict.pkl
└── mevis
Run MeViS evaluation with:
MODEL_CHECKPOINT=/path/to/checkpoint \
bash scripts/eval_mevis.sh mevis_validIf your dataset is not stored under the default <repo>/dataset/RVOS_ROOT, set RVOS_ROOT explicitly:
MODEL_CHECKPOINT=/path/to/checkpoint \
RVOS_ROOT=/path/to/RVOS_ROOT \
bash scripts/eval_mevis.sh mevis_validSupported dataset names for the script are:
mevis_validmevis_test
Note:
- Predictions are saved under
./eval_results/mevis_valid/by default.
To compute the MeViS metric after inference:
python -m utils.evaluation.eval_rvos ./eval_results/mevis_valid/<run_name> --dataset mevis_valid- The project page will be updated as the release is polished further.
This project builds upon prior work, including VISA, LISA, VideoChat-Flash, and SAM2.
We thank the authors for releasing their code and models.