Skip to content

USask-BINFO/glmask-semantic2instance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLMask Semantic2Instance

This repository contains the official implementation of GLMask, a structural image representation designed for semantic-to-instance segmentation with minimal manual annotation. The method was introduced in our paper:

From Semantic To Instance: A Semi-Self-Supervised Learning Approach

GLMask replaces standard RGB inputs with a three-channel structural representation composed of Grayscale (G), CIELAB Lightness (L), and a semantic mask (M). This design encourages the model to focus on shape, texture, and structural cues rather than color, improving generalization across diverse acquisition domains.

Our framework combines:

  • Synthetic data generation via cut-and-paste simulation
  • GLMask-based structural representation learning
  • Rotation-based domain adaptation
  • YOLOv9 instance segmentation

The approach achieves:

  • 98.5% mAP@50 on wheat head instance segmentation
  • Consistent cross-domain robustness across 18 acquisition environments
  • Significant performance gains on the MS COCO dataset

Method Overview

Framework Overview Figure G1: Overall pipeline from semantic mask generation, to GLMask construction, and instance segmentation using GLMask and synthetic pretraining.


Domain-Based Performance Analysis ($GHD_{te}$)

Table G1 – Per-Domain mAP@50 Comparison Table G1: Per-domain mAP@50 comparison between the RGB-based BaseModel and the proposed RoAModel across the 18 acquisition domains of the GHDte test set. The BaseModel exhibits substantial performance variability, with $mAP@50$ ranging from as low as 8.7% to 71.1%, reflecting its sensitivity to domain shifts such as growth stage variation, illumination changes, and object density. In contrast, RoAModel consistently achieves high performance across all domains, with mAP@50 values ranging from 95.8% to 99.5%. Notably, the largest gains are observed in visually challenging environments (e.g., domains where BaseModel performance drops below 30%), indicating that the proposed GLMask representation and rotation-based domain adaptation effectively mitigate cross-domain instability. These results demonstrate that structural semantic-to-instance transfer provides robust generalization in dense agricultural imagery, significantly reducing performance variance across heterogeneous acquisition conditions.


Quality Assessment

GWHD Dataset Quality Assessment Figure G2: Prediction Performance of RoAModel across the $18$ domains of $GHD_{te}$ test set.

COCO Dataset Quality Assessment Figure G3: Prediction performance of the COCO models on the Microsoft COCO 2017 dataset. Our proposed GLMask approach consistently achieved superior segmentation performance (columns B, C, D, and E), and obtained higher confidence scores for detected objects (A through F). In some cases both RGB and \ColorMap models failed to detect objects of interest, examples include the truck in column D and people in F. In addition, near-perfect segmentation performance was observed in cases such as A.


Installation

Prerequisites

  • Python 3.11+
  • CUDA-enabled GPU (for training and inference)

Setup

  1. Clone the repository:

    git clone https://github.com/your-username/glmask-semantic2instance.git
    cd glmask-semantic2instance
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate
  3. Install dependencies:

    pip install -e .

Usage

Data Preparation

Place your datasets in the data/ directory following the structure outlined in the configuration files. Sample metadata files are provided in the data/ directory (background_videos_metadata.csv, foreground_videos_metadata.csv, and segmented_samples_metadata.csv) to illustrate the expected input format for running the data synthesis pipeline.

Training and evaluation datasets must be organized in the standard YOLO format (image files with corresponding label .txt files following the class x_center y_center width height convention).

Synthetic Data Generation

This stage generates synthetic training data by overlaying extracted objects onto backgrounds.

  • Extract Frames from Videos:
    wheathead-sim-frames --config configs/simulator/frames_extractor.yaml
  • Extract Objects from Images:
    wheathead-sim-objects --config configs/simulator/objects_extractor.yaml
  • Run Simulator:
    wheathead-simulator --config configs/simulator/simulator.yaml
  • Visualize Simulated Data:
    wheathead-sim-visualizer --config configs/simulator/visualizer.yaml

Semantic Segmentation (s5seg)

The semantic mask channel (M) in the GLMask representation is produced by this stage. Place pretrained weights at model_weights/S5Seg_Best.pt before running.

Pretrained weights can be downloaded from: Download Weights

  • Train (optional — skip if using pretrained weights):
    wheathead-s5seg-train --config configs/s5seg/train.yaml
  • Evaluate:
    wheathead-s5seg-eval --config configs/s5seg/eval.yaml
  • Predict (generates semantic masks required for GLMask construction):
    wheathead-s5seg-predict --config configs/s5seg/predict.yaml

Update data_path in the config to point to your image metadata CSV, and set predict.prediction_dir to the directory where masks should be written.

Preprocessing (GLMask & Utilities)

GLMask replaces RGB input with a three-channel structural representation (Grayscale, LAB-L, Mask) to reduce color dependency.

  • Convert RGB to GLMask:
    wheathead-rgb2glm --config configs/yolo/process_confs/rgb2glm.yaml
  • Convert Masks to Contours:
    wheathead-mask2contour --config configs/yolo/process_confs/mask2contour.yaml
  • Rotate Images (Domain Adaptation):
    wheathead-rotator --config configs/yolo/process_confs/rotator.yaml
  • Visualize Annotations:
    wheathead-visualizer --config configs/yolo/process_confs/visualizer.yaml

Training

To train a new model, use the wheathead-train script with a configuration file:

  • Modify configs/yolo/train.yaml
  • Modify configs/yolo/model_confs/simulated.yaml
wheathead-train --config configs/yolo/train.yaml

All training hyperparameters, including model weights, data paths, and learning rates, are defined in the YAML configuration file.

Evaluation

To evaluate a trained model, use the wheathead-eval script:

  • Modify configs/yolo/eval.yaml
  • Modify configs/yolo/model_confs/gwhd_centers.yaml
wheathead-eval --config configs/yolo/eval.yaml

Prediction

To run inference with a trained model, use the wheathead-pred script:

  • Modify configs/yolo/pred.yaml
wheathead-pred --config configs/yolo/predict.yaml

Configuration

All scripts (train, eval, predict) are controlled by YAML configuration files in the configs/yolo/ directory. Modify these files to change hyperparameters, paths, and other settings.

Key Configuration Fields:

  • model_weights: Path to the model weights file (relative to model_weights/).
  • data: Path to the data configuration file.
  • All other YOLOv9 hyperparameters.

Refer to the provided *.yaml files for examples.

Citation

@article{najafian2025semantic,
  title={From Semantic To Instance: A Semi-Self-Supervised Learning Approach},
  author={Najafian, Keyhan and Maleki, Farhad and Jin, Lingling and Stavness, Ian},
  journal={arXiv preprint arXiv:2506.16563},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages