A private reading room for a small group of friends who take YouTube seriously.
You drop a YouTube video in the group chat. Three friends say they'll watch it. One actually does, a week later, alone, and forgets what they wanted to say. The other two never get around to it.
The video had real signal. A framework you could apply. A story worth discussing. But the knowledge dissolved — into separate browser sessions, half-watched tabs, and messages that got buried.
- The insight lived in your head, not somewhere shareable
- There was no way to read the transcript without leaving the video
- Analysis you'd want to reference later didn't exist
- You watched it once and moved on
Sound familiar?
"I'll send you the timestamp." — said before forgetting the timestamp, the video, and what it was about.
Everyone in the group is curious. Nobody has unlimited time. You need a way to extract signal from a video without treating it like a solo research project.
The transcript is already there. The AI tooling already exists. The only missing piece was a workspace that wired it together — for a specific group of people who already trust each other's taste in content.
Transcript Library is a private internal tool for a small group of friends built around a shared YouTube playlist.
| Layer | What It Does |
|---|---|
| Catalog | Refreshes a local SQLite catalog from the transcript repo for all browse reads |
| Player | Embeds the YouTube video in-app — no tab switching |
| Analysis | Runs AI synthesis headlessly via claude CLI or codex CLI |
| Knowledge | Stores markdown notes alongside video insights for long-term reference |
This is not a SaaS product. It is a proof of concept for a trusted group that already has access to Claude and ChatGPT tooling.
The workspace: player + analysis on one page
Library > Channel > Video Title
[ YouTube player — full width, no chrome ]
Analysis
──────────────────────────────────────────
Summary Key Takeaways Action Items
Full report ↓ (rendered inline, no disclosure)
Transcript
──────────────────────────────────────────
Part 1 · 2,400 words Open ↗
Part 2 · 1,800 words Open ↗
The pipeline: how a video becomes an insight
Shared YouTube Playlist(s)
↓
GitHub Action (every 4h) — yt-dlp + Python pipeline
↓
pipeline/youtube-transcripts/ (committed to repo)
↓
Coolify auto-deploy (Docker Compose)
↓
docker-entrypoint.sh rebuilds catalog if transcripts changed
↓
POST /api/analyze?videoId=...
↓
claude CLI or codex CLI (headless, local)
↓
data/insights/<videoId>/analysis.md
| Feature | How It Works | Why It Matters |
|---|---|---|
| Embedded player | YouTube iframe, no redirect | Watch and read without splitting attention |
| Headless analysis | claude-cli or codex-cli via provider abstraction | Run from any machine, swap providers without touching UI |
| Insight artifacts | Canonical analysis.md + run metadata per video |
Stable lookup by videoId, human-readable alongside machine paths |
| Live status | SSE stream during analysis run | Know when it's done without refreshing |
| Knowledge base | Markdown folders alongside video insights | Essays and notes in the same editorial workspace |
| Breadcrumb navigation | Library → Channel → Video | Always know where you are, always one click back |
- Node.js 18+ / Bun
- Transcripts are embedded in
pipeline/— no external repo needed claudeCLI orcodexCLI (for running analysis)
git clone https://github.com/AojdevStudio/transcript-library
cd transcript-library
bun install
cp .env.example .env.local# Optional — local dev override only (transcripts are embedded in pipeline/ by default)
# PLAYLIST_TRANSCRIPTS_REPO=/absolute/path/to/playlist-transcripts
# Optional
ANALYSIS_PROVIDER=claude-cli
INSIGHTS_BASE_DIR=/srv/transcript-library/insights # hosted deploys
CATALOG_DB_PATH=/srv/transcript-library/catalog/catalog.db
# Hosted deployment (set these when deploying, not for local dev)
HOSTED=true # enables preflight validation + hosted guard
CLOUDFLARE_ACCESS_AUD=<cf-access-aud> # required — trusts browser identity from Cloudflare Access
PRIVATE_API_TOKEN=<strong-random> # machine token for supported automation entrypoints
SYNC_TOKEN=<webhook-secret> # recommended — authenticates /api/sync-hook callersLocal dev needs zero hosted config. Leave
HOSTEDunset and all API routes work without authentication. The server logs warnings for missing vars but never blocks startup.Hosted access model:
library.aojdevstudio.meis the friend-facing Cloudflare Access hostname. Approved friends use browser access there with Cloudflare-managed identity. Do not shipPRIVATE_API_TOKENto the browser or assume bearer-only access is supported on that hostname. Machine access stays on explicit automation paths such as/api/sync-hook, same-host cron/systemd jobs, or a dedicated automation/deploy hostname.
just start
# → http://localhost:3939Each analysis lives under a stable videoId path. Local development defaults to
data/insights, while the canonical hosted path is /srv/transcript-library/insights via
INSIGHTS_BASE_DIR.
data/insights/<videoId>/
analysis.json ← authoritative structured artifact
analysis.md ← human-readable report derived from JSON
<slugified-title>.md ← human-readable copy
video-metadata.json ← channel, topic, published date
run.json ← provider, model, timing
worker-stdout.txt ← live log during run
worker-stderr.txt ← errors
status.json ← idle | running | complete | failed
data/insights/.migration-status.json
remainingLegacyCount ← machine-checkable migration window status
Legacy markdown-only artifacts are supported only during the one-time migration window. Operators
can check migration completion with node scripts/migrate-legacy-insights-to-json.ts --check and
complete the upgrade by rerunning the script without --check.
Browse reads are SQLite-only after Phase 2. The app keeps the live catalog at
data/catalog/catalog.db by default and writes the latest import report to
data/catalog/last-import-validation.json unless CATALOG_DB_PATH points somewhere else.
npx tsx scripts/rebuild-catalog.ts
npx tsx scripts/rebuild-catalog.ts --checknpx tsx scripts/rebuild-catalog.tsrebuilds a temp SQLite snapshot, validates it, and atomically swaps it into place only when the import passes.npx tsx scripts/rebuild-catalog.ts --checkruns the same validation gate without replacing the live DB, while still updatinglast-import-validation.jsonfor operator review.- A failed validation leaves the last known-good
catalog.dbin place. The app does not fall back tovideos.csvat runtime anymore. POST /api/sync-hookis retired — it returns 410. Catalog rebuild on deploy is handled bydocker-entrypoint.sh, which detects transcript changes and triggers a rebuild automatically.scripts/daily-operational-sweep.tsuses the same refresh authority before reading browse metadata, so unattended automation and the app use the same catalog authority.
Analysis runs through a thin provider boundary. Swap ANALYSIS_PROVIDER to switch between claude-cli and codex-cli — no UI changes, no redeployment.
# In .env.local
ANALYSIS_PROVIDER=claude-cli # default
ANALYSIS_PROVIDER=codex-cli # alternativePhase 3 keeps the operator story simple and durable:
run.jsonis the latest durable run record for avideoId, including provider, model, lifecycle, and timing.status.jsonis the compatibility artifact that mirrors the current lifecycle for quick reads and older surfaces.worker-stdout.txtandworker-stderr.txtremain the raw evidence trail when a run needs deeper inspection.reconciliation.jsonrecords whether the latest durable run and the expected artifacts still agree, including mismatch reasons and rerun-ready guidance.GET /api/insightis the status-first snapshot used by the video workspace. It returns lifecycle, stage, retry guidance, reconciliation details, recent log lines, and the current artifact bundle without making operators read raw files first.GET /api/insight/streamreuses a shared per-video snapshot cache so concurrent viewers consume the same live status payload instead of polling disk independently. The workspace prioritizes stage, retry guidance, andrecentLogs; full raw logs stay secondary.
When reconciliation.json reports a mismatch, the app treats the latest run as retry-needed instead of quietly presenting it as normal success. The intended operator recovery path is a clean rerun, not manual file repair.
POST /api/analyze?videoId=... Start headless analysis
GET /api/analyze/status?videoId=... Poll run status
GET /api/insight?videoId=... Fetch completed insight
GET /api/insight/stream?videoId=... SSE stream during run
GET /api/raw?path=... Serve raw transcript chunks
just start # Dev server
just prod-start # Production
just build # Next.js build
just lint # ESLint
just typecheck # tsc --noEmit
just daily-sweep # Unattended daily sweep: refresh-only ingest + safe repair, no analysis launch
just backfill-insights # Explicit analysis workflow for existing videos
npx tsx scripts/rebuild-catalog.ts --check # Validate catalog parity without cutover
npx tsx scripts/benchmark-hosted-scale.ts --check # Scale validation (1000-video benchmark)Schedule this command for unattended operation:
just daily-sweep
# or: node --import tsx scripts/daily-operational-sweep.tsThe daily sweep is the unattended default. It refreshes source state, republishes browse state, runs
only the conservative historical repair pass, and writes a durable operator record to
data/runtime/daily-operational-sweep/latest.json by default (or the sibling runtime/
directory next to INSIGHTS_BASE_DIR on hosted installs). Each run also writes an immutable archive
record under data/runtime/daily-operational-sweep/archive/<sweepId>.json.
When the sweep reports manualFollowUpVideoIds, those are rerun-only videos: the sweep left them
visible for manual follow-up instead of fabricating run.json or starting analysis work. Analysis
remains on-demand or explicit.
This started as a frustration. Our group watches a lot of YouTube — not casually, but deliberately. We share links and say "this one is worth your time." But saying it and actually watching it together are different things.
Transcript data for 243 videos across 91 channels was already being pulled — that pipeline is now merged into this repo under pipeline/, with a GitHub Action syncing every 4 hours and committing the results. The AI tooling already existed. What didn't exist was a workspace that made the signal accessible without a separate workflow for every person in the group.
So this became a reading room. You pick a video, the player loads inline, the analysis runs in the background, and the transcript is there if you want the exact words. The knowledge base holds notes alongside the video insights. Everything is organized by the same videoId key, so nothing ever gets lost.
It's private, it's opinionated, and it's built for exactly one use case: a small group of friends who take ideas seriously.
Built for the group. Kept private. Worth sharing the idea.
