A citation-backed RAG pipeline for NMR structure elucidation — retrieves first, reasons second, and shows its work.
Live demo: available on request — DM me on LinkedIn and I'll send the access link.
I have a Masters in Chemistry. The first time I sat through an NMR structure-elucidation session — really sat through it — I struggled. Reading peaks, mapping shifts to fragments, working backwards to a structure. It was hard. Not the kind of hard you fake your way through.
I transitioned into AI after that. But the Chemistry never quite left.
I've been a mentor at RJSF since 2021. In 2025, at one of the Chemistry Research Drive sessions, a professor was walking the group through NMR structure elucidation — peak by peak, candidate by candidate, slowly building toward a structure with reasoning every chemist in the room could follow. And it hit me:
This can be automated with an LLM. I know it's hard. But it's possible.
That was the spark. I don't claim Curie is perfect. I claim it's scalable — and that each iteration makes it better. The pull keeps growing, because I happen to sit in a rare intersection: domain knowledge in both spaces.
The name is half-joke, half-tribute. Marie Curie identified unknown substances by their fingerprints — radioactive emissions then, NMR peaks now. And curious is the only honest word for why I keep going.
You give it a 1H + 13C NMR spectrum and a molecular formula hint. It returns:
- A ranked list of candidate structures with confidence scores
- Per-peak interpretation — which atom in the structure each peak comes from, grounded in retrieved analogues with citations
- Interactive RDKit visualisation — hover a peak, the corresponding atom lights up; hover an atom, its peaks light up
- A "why this, not that" explanation — ruled-out candidates with reasons
- Ambiguity warnings — when the signal alone isn't enough and 2D NMR is needed
The whole point is that Curie shows its work. No black-box "your molecule is X." Every conclusion is traceable.
Retrieve first. Reason second.
That's the entire discipline. The LLM never invents a structure — every claim it makes is grounded in a candidate that Layer 1 actually retrieved, then double-checked by an RDKit substructure agent before Layer 3 closes the loop with forward prediction.
| Stage | What happens | Why it matters |
|---|---|---|
| 1 · FAISS retrieval | Embed query peaks → top-K candidates from textbook NMR corpus | LLMs hallucinate molecules. Vector retrieval pins the reasoning to known chemistry. |
| 2 · Grounded peak interpretation | Per-peak LLM reasoning → fragment mapping with provenance citations | Every conclusion traces back to a retrieved analogue. No black-box verdicts. |
| Agent · RDKit substructure check | Validates each fragment claim against the molecule's bonds and topology | Catches LLM outputs that sound chemically reasonable but aren't. |
| 3 · Forward NMR prediction | Simulate spectra for top-3 → compare to input → match/mismatch verdict | Closes the loop. Final confidence is empirical, not just retrieval similarity. |
Benchmarked at 60% top-1 · 100% top-5 on the textbook validation set. The 60% is intentional, not a ceiling — Curie is calibrated to not over-rank close analogues as exact matches, because in pharma research the gap between a 95% structural match and a 100% one can mean a different molecule entirely.
Curie does step 2 of NMR elucidation — substructure inference + analog retrieval. It does not do combinatorial structure assembly.
| Case | What you get |
|---|---|
| A — Known compound | Exact match returned with full provenance |
| B — Close analog | Top-ranked analog + per-peak grounding + confidence |
| C — Novel scaffold | Substructure profile + "needs 2D NMR (HSQC/HMBC)" guidance |
I'd rather have a tool that's honest about case C than one that hallucinates a structure to look impressive.
Three layers, two deployments:
| Layer | What it does | Where it runs |
|---|---|---|
| Client | React + Vite + Tailwind + RDKit-JS — interactive spectrum and structure viewer | Vercel |
| Backend | FastAPI + FAISS index + LangChain ChatService + RDKit substructure agent + SSE event bus |
Hugging Face Spaces |
| LLM Providers | Google Gemini 2.0 Flash (primary) · Groq Llama (fallback via LangChain with_fallbacks) |
External APIs |
Curie's reasoning is the kind of work a researcher would re-run if it failed once. So fallback isn't "redundancy theatre" — it's the difference between a usable tool and one that 503s in front of a recruiter. LangChain's with_fallbacks lets the primary be the best model (Gemini 2.0 Flash for free-tier reasoning quality) while keeping a fast Groq backup if Gemini hits a quota wall.
An LLM alone, given peaks, will hallucinate a structure. With FAISS retrieval anchoring it to known analogs, the LLM is constrained to reason about what's actually plausible — not invent. This is the difference between a tool a chemist trusts and one they laugh at.
Drop a .csv, .xlsx, or .jdx file, or type peaks in directly. Four tier-1 presets are wired up (Ibuprofen, Acetophenone, Vanillin, 4-MeO-cinnamate) so anyone can see the pipeline run end-to-end without their own data.
The architecture from the diagram above, made visible: Features → Retrieval → Interpret → Scoring → Prediction → Complete. Streamed over Server-Sent Events so you watch the reasoning happen, not just the final answer. The little "E geometry" tag is the RDKit substructure agent surfacing a constraint it already found from the peak list.
The chosen structure renders with RDKit, with both 1H and 13C peak shifts laid out on either side as a chemist would read them. Molecular formula, MW, and degree of unsaturation surface at the top — the metadata you'd want before trusting the result.
These two views together are the interaction that makes Curie useful instead of just impressive. Hover a fragment label — the matching atoms in the structure glow. Reverse it: hover a peak on the side rails, the responsible atoms highlight. It's the explanation a chemist would draw on a whiteboard, made interactive and reproducible.
This is the interaction that makes Curie useful instead of just impressive. Hover Para-Aromatic Ring or Trans Vinyl (E) — the matching atoms in the structure glow, and the responsible peaks light up. Reverse it: hover a peak, the atom highlights. It's the explanation a chemist would draw on a whiteboard, made interactive.
| Layer | Stack |
|---|---|
| Frontend | React · Vite · Tailwind · RDKit-JS |
| Backend | Python · FastAPI · FAISS · LangChain · RDKit · pdfplumber |
| LLMs | Google Gemini 2.0 Flash (primary) · Groq Llama 3.3 70B (fallback) |
| Streaming | Server-Sent Events (SSE) for live pipeline updates |
| Deployment | Vercel (frontend) · Hugging Face Spaces (backend) |
curie/
├── src/ React UI — file upload + interactive spectrum/structure viewer
├── backend/
│ ├── app/
│ │ ├── core/ FAISS retrieval, event bus (SSE)
│ │ ├── services/ ChatService (LangChain with_fallbacks), Layer 2/3 logic
│ │ ├── routes/ /api/v1/elucidate, /api/v1/molecule, /api/v1/stream
│ │ └── models/ Pydantic schemas
│ ├── prompts/ Layer 2 grounded-reasoning prompt templates
│ ├── scripts/ Data ingestion, embedding builds
│ └── Dockerfile HF Spaces deployment
└── assets/ Architecture + flow illustrations + screenshots
npm install
npm run devcd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env # add GOOGLE_API_KEY (primary) + GROQ_API_KEY (fallback)
python run.py- Novel scaffolds outside the FAISS training corpus can still produce confidently-wrong structures. For inputs the index hasn't seen, the pipeline currently skips the intermediate Layer 1 / Layer 2 views and jumps to a structure verdict — losing the provenance signal that makes Curie trustworthy on known compounds.
- No mass-spec or IR cross-validation yet. Single-spectroscopy reasoning has inherent ambiguity at this scope — Curie can't break a tie the spectrum itself can't break.
- Top-1 accuracy ≈ 60% on textbook compounds, 100% top-5. Treat Curie as a retrieval-grounded hypothesis generator, not an oracle. The "What's next" items below directly address the first two.
I'm still digging. The pull keeps growing.
- Combinatorial structure assembly — case C (novel scaffolds) gets a real candidate set, not just guidance
- 2D NMR (HSQC, HMBC) ingestion — collapses ambiguity that 1D alone can't resolve
- Wider compound library — current FAISS index is textbook-scope; production targets pharma-scale
- Reasoning trace export — researchers want the full audit log as a downloadable PDF for IP records
This isn't a finished tool. It's a tool that knows what it doesn't know yet — and that knows where to grow.
Sindhuja Sivaraman · MSc Chemistry · MS Data Science → Senior Engineer, AI/ML — HTC Global Services Portfolio · GitHub
I know it's hard. But it's possible.






