This repository contains the official implementation of GLMask, a structural image representation designed for semantic-to-instance segmentation with minimal manual annotation. The method was introduced in our paper:
From Semantic To Instance: A Semi-Self-Supervised Learning Approach
GLMask replaces standard RGB inputs with a three-channel structural representation composed of Grayscale (G), CIELAB Lightness (L), and a semantic mask (M). This design encourages the model to focus on shape, texture, and structural cues rather than color, improving generalization across diverse acquisition domains.
Our framework combines:
- Synthetic data generation via cut-and-paste simulation
- GLMask-based structural representation learning
- Rotation-based domain adaptation
- YOLOv9 instance segmentation
The approach achieves:
- 98.5% mAP@50 on wheat head instance segmentation
- Consistent cross-domain robustness across 18 acquisition environments
- Significant performance gains on the MS COCO dataset
Figure G1: Overall pipeline from semantic mask generation, to GLMask
construction, and instance segmentation using GLMask and synthetic pretraining.
Table G1: Per-domain mAP@50 comparison between the RGB-based BaseModel
and the proposed RoAModel across the 18 acquisition domains of the GHDte test
set. The BaseModel exhibits substantial performance variability, with
Figure G2: Prediction Performance of RoAModel across the
Figure G3: Prediction performance of the COCO models on the Microsoft
COCO 2017 dataset. Our proposed GLMask approach consistently achieved superior
segmentation performance (columns B, C, D, and E), and obtained higher
confidence scores for detected objects (A through F).
In some cases both RGB and \ColorMap models failed to detect objects of
interest, examples include the truck in column D and people in F. In
addition, near-perfect segmentation performance was observed in cases such as A.
- Python 3.11+
- CUDA-enabled GPU (for training and inference)
-
Clone the repository:
git clone https://github.com/your-username/glmask-semantic2instance.git cd glmask-semantic2instance -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate -
Install dependencies:
pip install -e .
Place your datasets in the data/ directory following the structure outlined in the configuration files.
Sample metadata files are provided in the data/ directory
(background_videos_metadata.csv, foreground_videos_metadata.csv, and
segmented_samples_metadata.csv) to illustrate the expected input format
for running the data synthesis pipeline.
Training and evaluation datasets must be organized in the standard YOLO
format (image files with corresponding label .txt files following the
class x_center y_center width height convention).
This stage generates synthetic training data by overlaying extracted objects onto backgrounds.
- Extract Frames from Videos:
wheathead-sim-frames --config configs/simulator/frames_extractor.yaml
- Extract Objects from Images:
wheathead-sim-objects --config configs/simulator/objects_extractor.yaml
- Run Simulator:
wheathead-simulator --config configs/simulator/simulator.yaml
- Visualize Simulated Data:
wheathead-sim-visualizer --config configs/simulator/visualizer.yaml
The semantic mask channel (M) in the GLMask representation is produced by this stage.
Place pretrained weights at model_weights/S5Seg_Best.pt before running.
Pretrained weights can be downloaded from: Download Weights
- Train (optional — skip if using pretrained weights):
wheathead-s5seg-train --config configs/s5seg/train.yaml
- Evaluate:
wheathead-s5seg-eval --config configs/s5seg/eval.yaml
- Predict (generates semantic masks required for GLMask construction):
wheathead-s5seg-predict --config configs/s5seg/predict.yaml
Update data_path in the config to point to your image metadata CSV, and set
predict.prediction_dir to the directory where masks should be written.
GLMask replaces RGB input with a three-channel structural representation (Grayscale, LAB-L, Mask) to reduce color dependency.
- Convert RGB to GLMask:
wheathead-rgb2glm --config configs/yolo/process_confs/rgb2glm.yaml
- Convert Masks to Contours:
wheathead-mask2contour --config configs/yolo/process_confs/mask2contour.yaml
- Rotate Images (Domain Adaptation):
wheathead-rotator --config configs/yolo/process_confs/rotator.yaml
- Visualize Annotations:
wheathead-visualizer --config configs/yolo/process_confs/visualizer.yaml
To train a new model, use the wheathead-train script with a configuration file:
- Modify configs/yolo/train.yaml
- Modify configs/yolo/model_confs/simulated.yaml
wheathead-train --config configs/yolo/train.yamlAll training hyperparameters, including model weights, data paths, and learning rates, are defined in the YAML configuration file.
To evaluate a trained model, use the wheathead-eval script:
- Modify configs/yolo/eval.yaml
- Modify configs/yolo/model_confs/gwhd_centers.yaml
wheathead-eval --config configs/yolo/eval.yamlTo run inference with a trained model, use the wheathead-pred script:
- Modify configs/yolo/pred.yaml
wheathead-pred --config configs/yolo/predict.yamlAll scripts (train, eval, predict) are controlled by YAML configuration
files in the configs/yolo/ directory. Modify these files to change
hyperparameters, paths, and other settings.
Key Configuration Fields:
model_weights: Path to the model weights file (relative tomodel_weights/).data: Path to the data configuration file.- All other YOLOv9 hyperparameters.
Refer to the provided *.yaml files for examples.
@article{najafian2025semantic,
title={From Semantic To Instance: A Semi-Self-Supervised Learning Approach},
author={Najafian, Keyhan and Maleki, Farhad and Jin, Lingling and Stavness, Ian},
journal={arXiv preprint arXiv:2506.16563},
year={2025}
}