Live at shck.dev/etymology
Find shared Indo-European roots between words across 16 languages, visualized as interactive etymology graphs.
Enter two words → see if they share a common ancestor → explore the etymological chain.
water (English) ← *wódr̥ (Proto-Indo-European) → вода (Russian)
- Data — 1.8M etymology relationships parsed from Wiktionary via etymology-db, stored in SQLite (~250MB)
- Algorithm — BFS from both words through ancestry edges (
inherited_from,derived_from,borrowed_from), intersects ancestor sets, prefers Proto-Indo-European roots - Visualization — D3.js force-directed graph showing the etymological chain: input words (blue) → intermediates (gray) → common ancestor (gold)
- Cognate detection — finds common proto-language ancestors between word pairs
- Ancestor translations — shows modern-language reflexes (descendants) on ancestor nodes in the graph
- Auto language detection — automatically detects English/Russian based on input script
- Non-cognate graphs — displays separate etymology trees even when words aren't related
- Autocomplete — prefix search across 1.8M etymology entries
- Interactive graph — zoom, pan, drag nodes in the D3.js visualization
| English | Russian | Common Ancestor | Proto-Language |
|---|---|---|---|
| water | вода | *wódr̥ | Proto-Indo-European |
| mother | мать | *méh₂tēr | Proto-Indo-European |
| three | три | *tréyes | Proto-Indo-European |
| night | ночь | *nókʷts | Proto-Indo-European |
- Backend: Python, FastAPI, SQLite
- Frontend: Vanilla JS, D3.js
- Data: etymology-db (Wiktionary parquet dump)
- Deployment: GitHub Actions → SSH to Hetzner VPS
# Install dependencies
uv sync
# Download & build the etymology database (~140MB download → ~250MB SQLite)
uv run python scripts/setup_db.py
# Run
uv run uvicorn backend.main:app --reload --port 8000Open localhost:8000.
backend/
main.py FastAPI app + endpoints
graph.py BFS cognate detection against SQLite
database.py SQLite queries & reflex lookups
models.py Pydantic models
frontend/
index.html Single-page UI
app.js Form handling, language detection, API calls
graph.js D3.js force-directed graph visualization
style.css Dark theme styling
scripts/
setup_db.py Parquet → SQLite pipeline
POST /api/cognates — { word_a: {term, lang}, word_b: {term, lang} }
GET /api/search — ?q=prefix&lang=English
Languages use full names: English, Russian, Proto-Indo-European, Old English, Latin, etc.
MIT