Challenge Website: Link
Our solution implements a two-phase methodology for geological image analysis. In the first phase, I developed a specialized neural network architecture leveraging state-of-the-art Vision Foundation Models, trained on both labeled and unlabeled datasets. The second phase employs an advanced prediction refinement pipeline that integrates the Segment Anything Model 2 (SAM2) to enhance segmentation accuracy and robustness.
- Create a new conda environment and install the required packages:
conda create -n cvgq python=3.11
conda activate cvgq
pip3 install torch torchvision torchaudio- Clone and install Segment Anything Model 2 (SAM2) from the official repository:
git clone https://github.com/facebookresearch/sam2
cd sam2
pip install -e .- Sam has a problem and configs should be placed in the parent sam2 folder. Inside the parent sam2 folder, run the following command:
cp -r sam2/configs/ .- Download sam2 checkpoints
cd checkpoints
bash download_ckpts.sh
cd ../.. # Return to the root folder- Download the pretrained weights from the release page. Select the file
pretrained_models.zipand place the checkpoints in thecheckpointsandfinal_checkpointsfolders.
The repository is structured as follows:
cvs
│
├───data
│ ├───core-values-test-data
│ ├───train
│ ├───train_unlabeled
├───checkpoints
├───src
│ ├───model.py
│ ├───data.py
│ ├───transforms.py
│ ├───utils.py
│ ├───my_utils.py
├───pred_vis
├───train.py
├───train_semi.py
|───predict.py
├───solution.ipynb
|───README.md
data: contains the dataset folders. Download the dataset from the official challenge website and unzip the files. Note that file image001269.png from the unlabeled dataset is corrupted and is removed.checkpoints: folder to save the trained model checkpoints.final_checkpoints: contains the the 5 trained model checkpoints used for the final submission.src: contains the source code files.model.py: contains the neural network architecturetrain.py: script to train the model on the labeled dataset.train_semi.py: script to train the model on both labeled and unlabeled datasets.utils.py: contains utility functions provided by the challenge organizers github repository.my_utils.py: contains utility functions implemented by me.predict.py: script to make predictions on the test dataset.solution.ipynb: Jupyter notebook containing the solution report.pred_vis: folder to save the prediction visualizations (I run the prediction script on the public test dataset and saved the predictions).README.md: this file.
To understand and reproduce my solution, please refer to the solution.ipynb notebook. The notebook provides a detailed explanation of the methodology, implementation, some insights and results.
In order to show important incremental improvements, I provide a table with the experiments conducted during the development of the solution. The table includes the experiment id, description, public dice score, and the checkpoint file.
| Experiment Id | Description | Public Dice Score | Checkpoint |
|---|---|---|---|
| 1 | Supervised Only (10 epochs, CE loss) | 0.5569 | model_supervised.pth |
| 2 | Semi-Supervised (25 epochs, CE loss) | 0.5910 | --- |
| 3 | Semi-Supervised (50 epochs, CE loss) | 0.6058 | --- |
| 4 | Semi-Supervised (100 epochs, CE loss) | 0.6413 | --- |
| 5 | Semi-Supervised (150 epochs, CE loss) | 0.6422 | model_semi_blr2em5.pth |
| 6 | Semi-Supervised (150 epochs, blr 4e-5, CE loss) | 0.6441 | model_semi.pth |
| 7 | Semi-Supervised (100 epochs, blr 4e-5, Ohem loss) | 0.6445 | model_semi_100eps_ohem.pth |
| 8 | Semi-Supervised (150 epochs, blr 4e-5, Consistency Reg) | 0.6473 | model_semi_creg.pth |
| 9 | Semi-Supervised (150 epochs, blr 4e-5, Combined loss) | 0.6486 | model_semi_combloss.pth |
| 10 | Ensemble of 6, 8, 9 | 0.6599 | --- |
| 11 | Ensemble of 6, 7, 8, 9 | 0.6642 | --- |
| 12 | Ensemble of 5, 6, 7, 8, 9 | 0.6655 | --- |
| 13 | Ensemble of 5, 6, 7, 8, 9, 11 + SAM2 AutoMaskGen Refine | 0.6718 | --- |
| 14 | Ensemble of 5, 6, 7, 8, 9, 11 + SAM2 AutoMaskGen Refine + Replace Confident(thresh 0.8) | 0.6868 | --- |
| 15 | Ensemble of 5, 6, 7, 8, 9, 11 + SAM2 AutoMaskGen Refine + Replace Confident(thresh 0.625) | 0.6951 | --- |
- Fastest GPU Inference. My algorithm performs inference on 70 images of the public test dataset in ~ 4 minutes.
- Fastest CPU Inference. Not tested.
- Best custom model that does not use a highly abstracted API. The main model architecture is implemented in PyTorch. Only for prediction refinement, I use the SAM2AutomaticMaskGenerator.
- Most innovative use of unlabelled data. I use state-of-the-art semi supervised learning techniques to leverage the unlabeled data.
- Best documentation. The solution report is detailed and provides a clear explanation of the methodology, implementation, and results.