Transcript Library

Watch the source. Read the analysis. Keep the signal.

A private reading room for a small group of friends who take YouTube seriously.

Library · Knowledge Base · Analysis Runtime

The Problem With Shared Playlists

You drop a YouTube video in the group chat. Three friends say they'll watch it. One actually does, a week later, alone, and forgets what they wanted to say. The other two never get around to it.

The video had real signal. A framework you could apply. A story worth discussing. But the knowledge dissolved — into separate browser sessions, half-watched tabs, and messages that got buried.

The insight lived in your head, not somewhere shareable
There was no way to read the transcript without leaving the video
Analysis you'd want to reference later didn't exist
You watched it once and moved on

Sound familiar?

"I'll send you the timestamp." — said before forgetting the timestamp, the video, and what it was about.

The Insight

Everyone in the group is curious. Nobody has unlimited time. You need a way to extract signal from a video without treating it like a solo research project.

Watch the video inside the app.

Let the analysis run in the background.

The transcript is already there. The AI tooling already exists. The only missing piece was a workspace that wired it together — for a specific group of people who already trust each other's taste in content.

A reading room for your shared playlist.

What This Is

Transcript Library is a private internal tool for a small group of friends built around a shared YouTube playlist.

Layer	What It Does
Catalog	Refreshes a local SQLite catalog from the transcript repo for all browse reads
Player	Embeds the YouTube video in-app — no tab switching
Analysis	Runs AI synthesis headlessly via `claude` CLI or `codex` CLI
Knowledge	Stores markdown notes alongside video insights for long-term reference

This is not a SaaS product. It is a proof of concept for a trusted group that already has access to Claude and ChatGPT tooling.

See It In Action

The workspace: player + analysis on one page

Library > Channel > Video Title

[  YouTube player — full width, no chrome  ]

Analysis
──────────────────────────────────────────
Summary    Key Takeaways    Action Items

Full report ↓ (rendered inline, no disclosure)

Transcript
──────────────────────────────────────────
Part 1  ·  2,400 words         Open ↗
Part 2  ·  1,800 words         Open ↗

The pipeline: how a video becomes an insight

Shared YouTube Playlist(s)
        ↓
GitHub Action (every 4h) — yt-dlp + Python pipeline
        ↓
pipeline/youtube-transcripts/ (committed to repo)
        ↓
Coolify auto-deploy (Docker Compose)
        ↓
docker-entrypoint.sh rebuilds catalog if transcripts changed
        ↓
POST /api/analyze?videoId=...
        ↓
claude CLI or codex CLI (headless, local)
        ↓
data/insights/<videoId>/analysis.md

What You Get

Feature	How It Works	Why It Matters
Embedded player	YouTube iframe, no redirect	Watch and read without splitting attention
Headless analysis	claude-cli or codex-cli via provider abstraction	Run from any machine, swap providers without touching UI
Insight artifacts	Canonical `analysis.md` + run metadata per video	Stable lookup by `videoId`, human-readable alongside machine paths
Live status	SSE stream during analysis run	Know when it's done without refreshing
Knowledge base	Markdown folders alongside video insights	Essays and notes in the same editorial workspace
Breadcrumb navigation	Library → Channel → Video	Always know where you are, always one click back

Quick Start

Prerequisites

Node.js 18+ / Bun
Transcripts are embedded in pipeline/ — no external repo needed
claude CLI or codex CLI (for running analysis)

Install

git clone https://github.com/AojdevStudio/transcript-library
cd transcript-library
bun install
cp .env.example .env.local

Configure

# Optional — local dev override only (transcripts are embedded in pipeline/ by default)
# PLAYLIST_TRANSCRIPTS_REPO=/absolute/path/to/playlist-transcripts

# Optional
ANALYSIS_PROVIDER=claude-cli
INSIGHTS_BASE_DIR=/srv/transcript-library/insights   # hosted deploys
CATALOG_DB_PATH=/srv/transcript-library/catalog/catalog.db

# Hosted deployment (set these when deploying, not for local dev)
HOSTED=true                          # enables preflight validation + hosted guard
CLOUDFLARE_ACCESS_AUD=<cf-access-aud> # required — trusts browser identity from Cloudflare Access
PRIVATE_API_TOKEN=<strong-random>    # machine token for supported automation entrypoints
SYNC_TOKEN=<webhook-secret>          # recommended — authenticates /api/sync-hook callers

Local dev needs zero hosted config. Leave HOSTED unset and all API routes work without authentication. The server logs warnings for missing vars but never blocks startup.

Hosted access model: library.aojdevstudio.me is the friend-facing Cloudflare Access hostname. Approved friends use browser access there with Cloudflare-managed identity. Do not ship PRIVATE_API_TOKEN to the browser or assume bearer-only access is supported on that hostname. Machine access stays on explicit automation paths such as /api/sync-hook, same-host cron/systemd jobs, or a dedicated automation/deploy hostname.

Run

just start
# → http://localhost:3939

How It Works

Artifact Layout

Each analysis lives under a stable videoId path. Local development defaults to data/insights, while the canonical hosted path is /srv/transcript-library/insights via INSIGHTS_BASE_DIR.

data/insights/<videoId>/
  analysis.json            ← authoritative structured artifact
  analysis.md              ← human-readable report derived from JSON
  <slugified-title>.md     ← human-readable copy
  video-metadata.json      ← channel, topic, published date
  run.json                 ← provider, model, timing
  worker-stdout.txt        ← live log during run
  worker-stderr.txt        ← errors
  status.json              ← idle | running | complete | failed

data/insights/.migration-status.json
  remainingLegacyCount     ← machine-checkable migration window status

Legacy markdown-only artifacts are supported only during the one-time migration window. Operators can check migration completion with node scripts/migrate-legacy-insights-to-json.ts --check and complete the upgrade by rerunning the script without --check.

Catalog Refresh Contract

Browse reads are SQLite-only after Phase 2. The app keeps the live catalog at data/catalog/catalog.db by default and writes the latest import report to data/catalog/last-import-validation.json unless CATALOG_DB_PATH points somewhere else.

npx tsx scripts/rebuild-catalog.ts
npx tsx scripts/rebuild-catalog.ts --check

npx tsx scripts/rebuild-catalog.ts rebuilds a temp SQLite snapshot, validates it, and atomically swaps it into place only when the import passes.
npx tsx scripts/rebuild-catalog.ts --check runs the same validation gate without replacing the live DB, while still updating last-import-validation.json for operator review.
A failed validation leaves the last known-good catalog.db in place. The app does not fall back to videos.csv at runtime anymore.
POST /api/sync-hook is retired — it returns 410. Catalog rebuild on deploy is handled by docker-entrypoint.sh, which detects transcript changes and triggers a rebuild automatically. scripts/daily-operational-sweep.ts uses the same refresh authority before reading browse metadata, so unattended automation and the app use the same catalog authority.

Provider Abstraction

Analysis runs through a thin provider boundary. Swap ANALYSIS_PROVIDER to switch between claude-cli and codex-cli — no UI changes, no redeployment.

# In .env.local
ANALYSIS_PROVIDER=claude-cli    # default
ANALYSIS_PROVIDER=codex-cli     # alternative

Runtime Observability Contract

Phase 3 keeps the operator story simple and durable:

run.json is the latest durable run record for a videoId, including provider, model, lifecycle, and timing.
status.json is the compatibility artifact that mirrors the current lifecycle for quick reads and older surfaces.
worker-stdout.txt and worker-stderr.txt remain the raw evidence trail when a run needs deeper inspection.
reconciliation.json records whether the latest durable run and the expected artifacts still agree, including mismatch reasons and rerun-ready guidance.
GET /api/insight is the status-first snapshot used by the video workspace. It returns lifecycle, stage, retry guidance, reconciliation details, recent log lines, and the current artifact bundle without making operators read raw files first.
GET /api/insight/stream reuses a shared per-video snapshot cache so concurrent viewers consume the same live status payload instead of polling disk independently. The workspace prioritizes stage, retry guidance, and recentLogs; full raw logs stay secondary.

When reconciliation.json reports a mismatch, the app treats the latest run as retry-needed instead of quietly presenting it as normal success. The intended operator recovery path is a clean rerun, not manual file repair.

Core API Routes

POST /api/analyze?videoId=...         Start headless analysis
GET  /api/analyze/status?videoId=...  Poll run status
GET  /api/insight?videoId=...         Fetch completed insight
GET  /api/insight/stream?videoId=...  SSE stream during run
GET  /api/raw?path=...                Serve raw transcript chunks

Commands

just start              # Dev server
just prod-start         # Production
just build              # Next.js build
just lint               # ESLint
just typecheck          # tsc --noEmit
just daily-sweep        # Unattended daily sweep: refresh-only ingest + safe repair, no analysis launch
just backfill-insights  # Explicit analysis workflow for existing videos
npx tsx scripts/rebuild-catalog.ts --check  # Validate catalog parity without cutover
npx tsx scripts/benchmark-hosted-scale.ts --check  # Scale validation (1000-video benchmark)

Unattended daily sweep

Schedule this command for unattended operation:

just daily-sweep
# or: node --import tsx scripts/daily-operational-sweep.ts

The daily sweep is the unattended default. It refreshes source state, republishes browse state, runs only the conservative historical repair pass, and writes a durable operator record to data/runtime/daily-operational-sweep/latest.json by default (or the sibling runtime/ directory next to INSIGHTS_BASE_DIR on hosted installs). Each run also writes an immutable archive record under data/runtime/daily-operational-sweep/archive/<sweepId>.json.

When the sweep reports manualFollowUpVideoIds, those are rerun-only videos: the sweep left them visible for manual follow-up instead of fabricating run.json or starting analysis work. Analysis remains on-demand or explicit.

The Story

This started as a frustration. Our group watches a lot of YouTube — not casually, but deliberately. We share links and say "this one is worth your time." But saying it and actually watching it together are different things.

Transcript data for 243 videos across 91 channels was already being pulled — that pipeline is now merged into this repo under pipeline/, with a GitHub Action syncing every 4 hours and committing the results. The AI tooling already existed. What didn't exist was a workspace that made the signal accessible without a separate workflow for every person in the group.

So this became a reading room. You pick a video, the player loads inline, the analysis runs in the background, and the transcript is there if you want the exact words. The knowledge base holds notes alongside the video insights. Everything is organized by the same videoId key, so nothing ever gets lost.

It's private, it's opinionated, and it's built for exactly one use case: a small group of friends who take ideas seriously.

The video is the source. The analysis is the shortcut. The discussion is the point.

Docs

Built for the group. Kept private. Worth sharing the idea.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.agents/skills		.agents/skills
.bg-shell		.bg-shell
.claude		.claude
.github/workflows		.github/workflows
.gsd		.gsd
.husky		.husky
.planning		.planning
MEMORY/WORK		MEMORY/WORK
PAI/USER/Plans		PAI/USER/Plans
data		data
deploy		deploy
docs		docs
knowledge		knowledge
mockups		mockups
pipeline		pipeline
public		public
reports		reports
scripts		scripts
src		src
tests/e2e		tests/e2e
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
docker-compose.yaml		docker-compose.yaml
eslint.config.mjs		eslint.config.mjs
justfile		justfile
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
playwright.existing-server.config.ts		playwright.existing-server.config.ts
postcss.config.mjs		postcss.config.mjs
skills-lock.json		skills-lock.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcript Library

Watch the source. Read the analysis. Keep the signal.

The Problem With Shared Playlists

The Insight

Watch the video inside the app.

Let the analysis run in the background.

A reading room for your shared playlist.

What This Is

See It In Action

What You Get

Quick Start

Prerequisites

Install

Configure

Run

How It Works

Artifact Layout

Catalog Refresh Contract

Provider Abstraction

Runtime Observability Contract

Core API Routes

Commands

Unattended daily sweep

The Story

The video is the source. The analysis is the shortcut. The discussion is the point.

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transcript Library

Watch the source. Read the analysis. Keep the signal.

The Problem With Shared Playlists

The Insight

Watch the video inside the app.

Let the analysis run in the background.

A reading room for your shared playlist.

What This Is

See It In Action

What You Get

Quick Start

Prerequisites

Install

Configure

Run

How It Works

Artifact Layout

Catalog Refresh Contract

Provider Abstraction

Runtime Observability Contract

Core API Routes

Commands

Unattended daily sweep

The Story

The video is the source. The analysis is the shortcut. The discussion is the point.

Docs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages