Etymology — Cognate Detector

Live at shck.dev/etymology

Find shared Indo-European roots between words across 16 languages, visualized as interactive etymology graphs.

Enter two words → see if they share a common ancestor → explore the etymological chain.

water (English) ← *wódr̥ (Proto-Indo-European) → вода (Russian)

How it works

Data — 1.8M etymology relationships parsed from Wiktionary via etymology-db, stored in SQLite (~250MB)
Algorithm — BFS from both words through ancestry edges (inherited_from, derived_from, borrowed_from), intersects ancestor sets, prefers Proto-Indo-European roots
Visualization — D3.js force-directed graph showing the etymological chain: input words (blue) → intermediates (gray) → common ancestor (gold)

Features

Cognate detection — finds common proto-language ancestors between word pairs
Ancestor translations — shows modern-language reflexes (descendants) on ancestor nodes in the graph
Auto language detection — automatically detects English/Russian based on input script
Non-cognate graphs — displays separate etymology trees even when words aren't related
Autocomplete — prefix search across 1.8M etymology entries
Interactive graph — zoom, pan, drag nodes in the D3.js visualization

Example cognate pairs

English	Russian	Common Ancestor	Proto-Language
water	вода	*wódr̥	Proto-Indo-European
mother	мать	*méh₂tēr	Proto-Indo-European
three	три	*tréyes	Proto-Indo-European
night	ночь	*nókʷts	Proto-Indo-European

Stack

Backend: Python, FastAPI, SQLite
Frontend: Vanilla JS, D3.js
Data: etymology-db (Wiktionary parquet dump)
Deployment: GitHub Actions → SSH to Hetzner VPS

Setup

# Install dependencies
uv sync

# Download & build the etymology database (~140MB download → ~250MB SQLite)
uv run python scripts/setup_db.py

# Run
uv run uvicorn backend.main:app --reload --port 8000

Open localhost:8000.

Project structure

backend/
  main.py        FastAPI app + endpoints
  graph.py       BFS cognate detection against SQLite
  database.py    SQLite queries & reflex lookups
  models.py      Pydantic models
frontend/
  index.html     Single-page UI
  app.js         Form handling, language detection, API calls
  graph.js       D3.js force-directed graph visualization
  style.css      Dark theme styling
scripts/
  setup_db.py    Parquet → SQLite pipeline

API

POST /api/cognates  — { word_a: {term, lang}, word_b: {term, lang} }
GET  /api/search    — ?q=prefix&lang=English

Languages use full names: English, Russian, Proto-Indo-European, Old English, Latin, etc.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Etymology — Cognate Detector

How it works

Features

Example cognate pairs

Stack

Setup

Project structure

API

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Etymology — Cognate Detector

How it works

Features

Example cognate pairs

Stack

Setup

Project structure

API

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages