Unsupervised pipeline for transforming raw indoor Terrestrial Laser Scanning (TLS) data into structured smart point clouds with object segmentation, scene graphs, and semantic labeling.
SmartPointClouds converts raw indoor TLS scans into structured, object-aware, and semantically enriched point cloud representations without requiring manually labeled training data.
The project combines structural plane extraction, self-supervised feature learning, superpoint clustering, object-level refinement, zero-shot semantic labeling, and scene graph construction. The main goal is to move beyond raw geometric point clouds toward smart point clouds that are easier to inspect, interpret, and use in downstream applications.
This project is developed as part of a research and thesis workflow on indoor point cloud understanding.
- Fully unsupervised / weakly supervised workflow for indoor TLS scenes
- Structural-first processing for separating floors, walls, ceilings, and clutter
- Hybrid point features combining geometry, normals, RGB, and spatial descriptors
- Self-supervised scene-level embeddings for improved grouping of clutter points
- Superpoint partitioning and region growing for initial object discovery
- Object refinement and instance assembly for fragmented furniture and indoor objects
- Zero-shot semantic labeling using CLIP-style image-language inference
- Scene graph construction for object relationships and spatial reasoning
- Human-readable evaluation outputs such as bounding boxes, label anchors, CSV predictions, and HTML viewers
Raw TLS scans are accurate but difficult to understand directly. They contain millions of points, noise, structural surfaces, furniture, clutter, partial objects, and fragmented geometry.
SmartPointClouds aims to create richer outputs such as:
| Output Layer | Purpose |
|---|---|
| Structural planes | Identify major scene surfaces such as floor, walls, and ceiling |
| Object instances | Group clutter points into object-like regions |
| Semantic labels | Suggest possible object names using zero-shot labeling |
| Scene graph | Represent objects and spatial relationships as nodes and edges |
| Visual inspection files | Support human evaluation through .ply, .csv, and .html outputs |
Potential application areas include indoor mapping, digital twins, facility management, BIM support, robotics, and 3D scene understanding research.
The pipeline is organized into three main layers:
-
Structural Processing
Raw TLS input is denoised, normalized, and separated into structural planes and clutter points. -
Clutter Segmentation and Object Assembly
Clutter points are represented using hybrid features, embedded into a learned latent space, partitioned into superpoints, and refined into object instances. -
Relational and Semantic Layer
Object instances are enriched with zero-shot semantic labels and spatial relationships in a scene graph.
smartpointclouds/
├── data/
│ ├── raw/ # Local raw scans; not recommended for Git
│ └── processed/ # Local generated outputs; not recommended for Git
│
├── docs/
│ └── assets/ # README images and diagrams
│
├── results/
│ ├── logs/ # Optional logs
│ ├── models/ # Optional trained/cached models
│ └── visualizations/ # Optional generated figures
│
├── src/
│ ├── app/
│ │ ├── pipeline.py # Main pipeline orchestration
│ │ └── archive/ # Older pipeline versions
│ │
│ ├── preprocessing/
│ │ ├── normalize.py # Normalization and scene preparation
│ │ ├── denoise.py # Statistical outlier removal and cleanup
│ │ ├── utils_io.py # I/O helper functions
│ │ ├── scene_frame.py # Scene frame utilities
│ │ └── archive/
│ │
│ ├── primitive/
│ │ ├── planes.py # Structural plane extraction
│ │ └── archive/
│ │
│ ├── clustering/
│ │ ├── partition_from_spt.py # Superpoints, region growing, clustering
│ │ └── archive/
│ │
│ ├── embedding/
│ │ ├── model.py # Self-supervised embedding model
│ │ └── archive/
│ │
│ ├── graph/
│ │ ├── scene_graph.py # Scene graph construction
│ │ ├── object_graph.py # Object graph utilities
│ │ └── archive/
│ │
│ └── tests/
│ └── run_pipeline.py # Main experimental entry point
│
├── config/ # Optional configuration files
├── README.md
├── LICENSE
└── .gitignore
Clone the repository:
git clone https://github.com/SolomonEmbafrash/smartpointclouds.git
cd smartpointcloudsCreate a Python environment:
conda create -n smartpc python=3.10
conda activate smartpcInstall core dependencies:
pip install numpy scipy scikit-learn open3d laspy torch torchvision transformers pillow networkx matplotlib tqdmIf an environment.yml file is maintained, you can alternatively use:
conda env create -f environment.yml
conda activate smartpcPlace raw input scans in:
data/raw/
Example:
data/raw/3rdflorcoffee.las
data/raw/fablabinterior.las
Run the pipeline from the repository root:
python src/tests/run_pipeline.pyOn Windows PowerShell:
python .\src\tests\run_pipeline.pyWhen running from Spyder:
%runfile D:/SmartPointClouds/smartpc/src/tests/run_pipeline.py --wdirOutputs are typically written to a timestamped folder under:
data/processed/
A typical run can generate:
| File | Description |
|---|---|
segmented.ply |
Point cloud after structural and segmentation processing |
object_segmented.ply |
Object-level segmentation result |
scene_graph.json |
Scene graph with objects and relationships |
scenegraph_viewer.html |
Browser-based scene graph visualization |
clip_predictions.csv |
Zero-shot top-N label predictions |
bounding_boxes_topN.ply |
Bounding boxes for selected predicted objects |
label_anchor_points.ply |
Anchor points for visual label placement |
run_log.txt |
Optional processing log and parameter summary |
The structural stage removes noise, prepares the scan, extracts dominant planes, classifies floor/wall/ceiling candidates, and separates structural points from clutter points.
Main ideas:
- Statistical Outlier Removal and cleanup
- normalization or scene-frame preparation
- Multi-RANSAC-style structural plane extraction
- classification of floor, wall, and ceiling candidates
- structural-clutter separation
Each clutter point is represented using a feature vector that combines geometry and appearance.
Typical feature groups include:
- local PCA geometry descriptors
- RGB appearance values
- normal vectors
- XYZ position values
This produces a feature matrix used for embedding, superpoint partitioning, and clustering.
The embedding stage learns scene-specific latent representations from clutter features. These embeddings are used to improve grouping of points that belong to similar local structures or object regions.
The collapsible diagram keeps the README readable while still documenting the learning component.
The clutter cloud is partitioned into local superpoints and grouped using graph-based region growing.
The region growing stage can use:
- spatial proximity
- embedding similarity
- geometric similarity
- color similarity
- height consistency
- neighborhood graph connectivity
The output is a set of initial object fragments.
Indoor objects are often fragmented into many pieces. The refinement stage attempts to merge related fragments into coherent object instances.
Example refinement operations:
- tiny fragment absorption
- local boundary repair
- planar consolidation
- closed-body assembly
- support-aware merging
- chair/table completion
- near-plane halo handling
This layer enriches detected object instances with semantic and relational information.
It includes:
- object node creation
- spatial relationship prediction
- graph assembly
- PCA-aligned multi-view rendering
- CLIP-style zero-shot label matching
- top-N label export and confidence filtering
Recommended Git workflow:
git status
git add -A
git commit -m "Describe the change clearly"
git pull --rebase origin main
git push origin mainBefore making the repository public, check that raw scans and generated outputs are not accidentally committed:
git ls-files data/raw
git ls-files data/processedThis project is released under the MIT License.
Solomon Kiflom
Aalto Fablab / Arcada University of Applied Sciences
Master’s Degree Programme in Big Data Analytics






