A personal project exploring the application of machine learning for reconstructing particle trajectories from the TrackML dataset.
This repository implements clustering algorithms to group 3D hit coordinates in particle detectors — bridging particle physics and data science.
In experimental particle physics, charged particles leave traces ("hits") as they move through layered detector systems.
The challenge is to group these hits into individual particle tracks, a process known as track reconstruction.
This project tackles that challenge using unsupervised learning algorithms to cluster 3D hit data into meaningful trajectories.
- Exploratory Data Analysis (EDA) — Analyze hit distributions and detector geometry.
- Machine Learning Models — Implement K-Means and DBSCAN clustering using Scikit-learn.
- 3D Visualization — Render hit data and clustering outputs with Matplotlib.
- Modular Codebase — Clean, extensible structure for quick experimentation and future upgrades.
This project uses the TrackML Particle Identification dataset from Kaggle,
simulating detector hits from a collider experiment.
Each event contains:
- 3D hit coordinates
- Detector layer information
- Ground-truth particle IDs (for benchmarking)
Open the Jupyter Notebook to explore the full workflow:
trackml_clustering_analysis.ipynb- Data loading and preprocessing
- Exploratory data analysis (EDA)
- Clustering with K-Means and DBSCAN
- 3D visualization of particle tracks
K-Means offers simplicity and speed but fails on complex geometries. DBSCAN, while slower, provides better alignment with the physics of track formation. Together, these insights lay the foundation for graph-based and ML-driven approaches in future iterations.
Planned improvements include:
- Feature Engineering — Incorporate radial, angular, and detector-layer features.
- Graph-Based Methods — Apply Graph Neural Networks (GNNs) for track association.
- Physics-Inspired Algorithms — Explore Kalman filters and Hough transforms.
- Benchmarking — Evaluate performance using ground-truth particle IDs.
This project is licensed under the MIT License — see the LICENSE.md file for details.
The TrackML collaboration for providing the dataset and research problem. CERN and global particle physics communities for inspiring this intersection of science and machine learning.