A project exploring agentic retrieval + 3D geometry embeddings.
The idea: train a lightweight PointNet-style model on ModelNet40, extract embeddings for shapes, store them in a vector database (ChromaDB), then use a LangChain tool-calling agent (Mistral) to answer questions like:
- “Find shapes most similar to this chair.”
- “Compare these two objects and explain why they differ.”
- “Show me the embedding space and highlight neighbors.”
Most 3D ML demos stop at “train a classifier” or “plot a point cloud”. I wanted something more interactive:
- Make 3D embeddings searchable (nearest neighbors, category browsing)
- Wrap the workflow in a tool-using agent that can chain steps (search → inspect → compare → visualize)
- Keep everything runnable from notebooks, without enterprise scaffolding
This project has two main parts:
Implemented in the notebook:
Workflow:
- Acquire dataset (ModelNet40 via
kagglehub) - Explore & visualize sample meshes/point clouds (Open3D + Matplotlib)
- Preprocess to point clouds (uniform sampling)
- Train a PointNet-style model (PyTorch Geometric
PointNetConv+ pooling) - Evaluate on the test split
- Extract embeddings from the network and save them under
checkpoints/embeddings/
Implemented in the notebook:
Workflow:
- Load an enriched JSON containing embeddings + metadata:
data/json/enriched_data_sample_with_embeddings.json
- Create and populate a ChromaDB collection (cosine similarity)
- Define a set of tools (search, inspection, comparison, visualization, geometric proxies)
- Create a LangChain agent (Mistral) that chooses tools automatically
- Run demo scenarios / custom questions
With the system initialized, you can:
- Retrieve nearest-neighbor shapes by embedding similarity
- Inspect a specific object’s metadata + embedding stats
- Compare two objects with cosine similarity + metadata differences
- See dataset-level distributions across categories
- Visualize the embedding space (PCA 2D) and highlight neighbors
- Run simple “geometric proxy” stats computed in embedding space (bounding-box span, proxy volume, aspect ratio)
It’s not meant to be physically accurate CAD analysis — it’s a fun bridge between 3D ML representations and RAG-style retrieval + reasoning.
- Python + Jupyter notebooks
- 3D & data:
open3d,numpy,pandas - ML:
torch,torch-geometric - Dim-reduction / analysis:
scikit-learn(PCA/TSNE),seaborn,matplotlib - Vector DB:
chromadb - Agent framework:
langchain,langgraph,langchain-mistralai - Dataset tooling:
kagglehub
Top-level things you’ll likely touch:
- shapes_agent.ipynb — the agent + tools + ChromaDB vector store
- pointnet_playground.ipynb — dataset → PointNet-style model → embeddings
requirements.txt— pinned Python depscheckpoints/— saved model + embeddingsdata/json/— enriched JSON with embeddings + metadatadata/point_clouds/— local point-cloud folder
If you have this project on GitHub (or anywhere git-accessible):
git clone <your-repo-url>
cd <repo-folder>If you downloaded a ZIP, just extract it and cd into the extracted folder.
Windows (PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtmacOS / Linux:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCopy the example file and fill in your key:
cp .env.example .envOr create a .env file in the project root.
Preferred:
MISTRAL_API_KEY=your_key_hereThis project also supports the older variable name used in this repo:
mistral_llm=your_key_hereImportant: don’t commit .env (this repo includes a .gitignore entry for it).
Option A (VS Code): open the .ipynb files and run cells with your .venv interpreter selected.
Option B (Jupyter):
jupyter labOpen and run:
What it covers:
- dataset download (Kagglehub)
- visualization and preprocessing
- PointNet-style model training/eval
- embedding extraction saved into
checkpoints/embeddings/
Open and run:
What it covers:
- load enriched JSON samples from
data/json/ - build ChromaDB collection (cosine)
- create LangChain agent with tools
- run demo scenarios / custom queries
- Model checkpoint:
checkpoints/pointnet_model.pth
- Embeddings:
checkpoints/embeddings/modelnet40_train_embeddings.ptcheckpoints/embeddings/modelnet40_test_embeddings.pt
- Normalized / enriched embeddings (extra experiments):
checkpoints/normalised_embeddings/checkpoints/enriched_embeddings/
- The notebook downloads ModelNet40 via Kaggle:
balraj98/modelnet40-princeton-3d-object-dataset
- You also have local point-cloud folders checked into the workspace:
data/point_clouds/
If you’re short on disk space, you can keep either the downloaded Kaggle copy or the local exports — you don’t necessarily need both.
The agent notebook exposes a small toolbox (implemented as LangChain tools):
find_similar_objects(object_id, top_k)— nearest neighbors by embedding similarityinspect_object(object_id)— metadata + embedding statisticscompare_objects(object_id_1, object_id_2)— cosine similarity + differencesget_category_statistics()— distribution of classes/super-categoriessearch_by_category(category_name, limit)— filter by class or super-categoryvisualize(object_id=None, plot_type="embedding"|"3d", top_k=5)— PCA embedding plot or 3D scatter renderget_bounding_box(object_id)— embedding-space span summaryestimate_volume(object_id)— embedding-space proxy volume + normget_geometric_summary(object_id)— centroid/aspect ratio proxies
A note on the “geometric” tools: they operate in embedding space, not in real XYZ coordinates. They’re useful for relative comparisons and intuition, not for real-world measurements.
.env— Mistral key (ignored by git)requirements.txt— pinned versions
If Open3D visualization fails in notebooks, it’s usually a Jupyter/renderer issue; the point-cloud notebook already attempts to enable WebRTC mode.
- ModelNet / ModelNet40 dataset (Princeton)
- PointNet (Qi et al.) + PointNet-style point cloud pipelines
- PyTorch Geometric
PointNetConv - ChromaDB vector database
- LangChain tool calling + agents
- Mistral API (via
langchain-mistralai)
This repo is intentionally notebook-driven and exploratory. If something looks “prototype-ish” (paths, plotting, quick heuristics), that’s by design — it’s a learning playground.