Skip to content

Interactive AI agent for 3D point cloud search, comparison, and embedding-based analysis.

Notifications You must be signed in to change notification settings

usman7384/PointBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic RAG for 3D Shapes (ModelNet40)

A project exploring agentic retrieval + 3D geometry embeddings.

The idea: train a lightweight PointNet-style model on ModelNet40, extract embeddings for shapes, store them in a vector database (ChromaDB), then use a LangChain tool-calling agent (Mistral) to answer questions like:

  • “Find shapes most similar to this chair.”
  • “Compare these two objects and explain why they differ.”
  • “Show me the embedding space and highlight neighbors.”

What problem this project aimed to solve

Most 3D ML demos stop at “train a classifier” or “plot a point cloud”. I wanted something more interactive:

  • Make 3D embeddings searchable (nearest neighbors, category browsing)
  • Wrap the workflow in a tool-using agent that can chain steps (search → inspect → compare → visualize)
  • Keep everything runnable from notebooks, without enterprise scaffolding

Methodology (high level)

This project has two main parts:

1) ModelNet40 → Point clouds → Embeddings

Implemented in the notebook:

Workflow:

  1. Acquire dataset (ModelNet40 via kagglehub)
  2. Explore & visualize sample meshes/point clouds (Open3D + Matplotlib)
  3. Preprocess to point clouds (uniform sampling)
  4. Train a PointNet-style model (PyTorch Geometric PointNetConv + pooling)
  5. Evaluate on the test split
  6. Extract embeddings from the network and save them under checkpoints/embeddings/

2) Embeddings → Vector DB → Agentic tool-calling

Implemented in the notebook:

Workflow:

  1. Load an enriched JSON containing embeddings + metadata:
    • data/json/enriched_data_sample_with_embeddings.json
  2. Create and populate a ChromaDB collection (cosine similarity)
  3. Define a set of tools (search, inspection, comparison, visualization, geometric proxies)
  4. Create a LangChain agent (Mistral) that chooses tools automatically
  5. Run demo scenarios / custom questions

Outcome (what you can do)

With the system initialized, you can:

  • Retrieve nearest-neighbor shapes by embedding similarity
  • Inspect a specific object’s metadata + embedding stats
  • Compare two objects with cosine similarity + metadata differences
  • See dataset-level distributions across categories
  • Visualize the embedding space (PCA 2D) and highlight neighbors
  • Run simple “geometric proxy” stats computed in embedding space (bounding-box span, proxy volume, aspect ratio)

It’s not meant to be physically accurate CAD analysis — it’s a fun bridge between 3D ML representations and RAG-style retrieval + reasoning.

Tech stack

  • Python + Jupyter notebooks
  • 3D & data: open3d, numpy, pandas
  • ML: torch, torch-geometric
  • Dim-reduction / analysis: scikit-learn (PCA/TSNE), seaborn, matplotlib
  • Vector DB: chromadb
  • Agent framework: langchain, langgraph, langchain-mistralai
  • Dataset tooling: kagglehub

Repo layout

Top-level things you’ll likely touch:

  • shapes_agent.ipynb — the agent + tools + ChromaDB vector store
  • pointnet_playground.ipynb — dataset → PointNet-style model → embeddings
  • requirements.txt — pinned Python deps
  • checkpoints/ — saved model + embeddings
  • data/json/ — enriched JSON with embeddings + metadata
  • data/point_clouds/ — local point-cloud folder

Setup

0) Clone the repo

If you have this project on GitHub (or anywhere git-accessible):

git clone <your-repo-url>
cd <repo-folder>

If you downloaded a ZIP, just extract it and cd into the extracted folder.

1) Create a Python environment

Windows (PowerShell):

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

macOS / Linux:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2) Configure the Mistral API key

Copy the example file and fill in your key:

cp .env.example .env

Or create a .env file in the project root.

Preferred:

MISTRAL_API_KEY=your_key_here

This project also supports the older variable name used in this repo:

mistral_llm=your_key_here

Important: don’t commit .env (this repo includes a .gitignore entry for it).

3) Open the notebooks

Option A (VS Code): open the .ipynb files and run cells with your .venv interpreter selected.

Option B (Jupyter):

jupyter lab

How to run (notebook order)

Notebook 1 — Model & embeddings

Open and run:

What it covers:

  • dataset download (Kagglehub)
  • visualization and preprocessing
  • PointNet-style model training/eval
  • embedding extraction saved into checkpoints/embeddings/

Notebook 2 — Agentic retrieval + analysis

Open and run:

What it covers:

  • load enriched JSON samples from data/json/
  • build ChromaDB collection (cosine)
  • create LangChain agent with tools
  • run demo scenarios / custom queries

Models & artifacts

  • Model checkpoint:
    • checkpoints/pointnet_model.pth
  • Embeddings:
    • checkpoints/embeddings/modelnet40_train_embeddings.pt
    • checkpoints/embeddings/modelnet40_test_embeddings.pt
  • Normalized / enriched embeddings (extra experiments):
    • checkpoints/normalised_embeddings/
    • checkpoints/enriched_embeddings/

Dataset notes

  • The notebook downloads ModelNet40 via Kaggle:
    • balraj98/modelnet40-princeton-3d-object-dataset
  • You also have local point-cloud folders checked into the workspace:
    • data/point_clouds/

If you’re short on disk space, you can keep either the downloaded Kaggle copy or the local exports — you don’t necessarily need both.

Agent & tools

The agent notebook exposes a small toolbox (implemented as LangChain tools):

  • find_similar_objects(object_id, top_k) — nearest neighbors by embedding similarity
  • inspect_object(object_id) — metadata + embedding statistics
  • compare_objects(object_id_1, object_id_2) — cosine similarity + differences
  • get_category_statistics() — distribution of classes/super-categories
  • search_by_category(category_name, limit) — filter by class or super-category
  • visualize(object_id=None, plot_type="embedding"|"3d", top_k=5) — PCA embedding plot or 3D scatter render
  • get_bounding_box(object_id) — embedding-space span summary
  • estimate_volume(object_id) — embedding-space proxy volume + norm
  • get_geometric_summary(object_id) — centroid/aspect ratio proxies

A note on the “geometric” tools: they operate in embedding space, not in real XYZ coordinates. They’re useful for relative comparisons and intuition, not for real-world measurements.

Config

  • .env — Mistral key (ignored by git)
  • requirements.txt — pinned versions

If Open3D visualization fails in notebooks, it’s usually a Jupyter/renderer issue; the point-cloud notebook already attempts to enable WebRTC mode.

References

  • ModelNet / ModelNet40 dataset (Princeton)
  • PointNet (Qi et al.) + PointNet-style point cloud pipelines
  • PyTorch Geometric PointNetConv
  • ChromaDB vector database
  • LangChain tool calling + agents
  • Mistral API (via langchain-mistralai)

Hobby-project disclaimer

This repo is intentionally notebook-driven and exploratory. If something looks “prototype-ish” (paths, plotting, quick heuristics), that’s by design — it’s a learning playground.

About

Interactive AI agent for 3D point cloud search, comparison, and embedding-based analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published