Skip to content

valtrof/curriculum-engine

Repository files navigation

Curriculum Engine

CI Python

Hybrid RAG pipeline that generates structured learning plans with verified resource links. Claude produces the curriculum structure and search queries — never URLs. Live APIs resolve every query to a real, validated link.

Built with Python, FastAPI, and the Anthropic Claude API with prompt caching.

The problem it solves

LLMs hallucinate URLs. A naive approach asks Claude to recommend resources and returns whatever links it generates — most of which are broken or fabricated. Curriculum Engine separates concerns: Claude handles curriculum design (what to learn, in what order, for how long); real APIs handle resource discovery (what actually exists and works).

Architecture

POST /plan
      │
      ▼
┌─────────────────────────────────────────┐
│  planner.py                             │
│  Claude Haiku (prompt-cached)           │
│  → structured curriculum               │
│  → search queries per resource          │
│     (no URLs — hallucination prevented) │
└──────────────┬──────────────────────────┘
               │ search queries
               ▼
┌─────────────────────────────────────────┐
│  retrieval.py  (asyncio.gather)         │
├────────────────────┬────────────────────┤
│  YouTube Data API  │  Serper.dev API    │
│  (video resources) │  (articles & web)  │
└────────────────────┴──────┬─────────────┘
                            │ real URLs
                            ▼
                     URL validation
                     HEAD check → swap 404s
                     → fallback to next result
                            │
                            ▼
                     LearningPlan response

Key technical decisions

No hallucinated URLs Claude is instructed to output only search queries, never URLs. Each query is resolved against a live API (YouTube Data API v3 or Serper.dev) to retrieve a real, current link. This makes the output reliably usable rather than plausible-looking but broken.

Prompt caching on the system prompt The fixed system prompt is cached using Anthropic's cache_control: ephemeral. On repeated calls (warm cache), input token costs drop by up to 90%. In a multi-user scenario this matters significantly — the system prompt is large and reused on every request.

Parallel async resource retrieval All resource lookups for a plan run concurrently via asyncio.gather. A 5-phase plan with 4 resources each (20 API calls) resolves in parallel rather than sequentially — response time scales with the slowest single call, not the total number of calls.

URL validation with automatic fallback Every retrieved URL is HEAD-checked before being included in the response. A 404 or timeout triggers a fallback to the next search result. The service degrades gracefully when optional API keys are absent — resources are returned without enriched URLs rather than failing.

Quick start

pip install -r requirements.txt
cp .env.example .env   # fill in your API keys
uvicorn api:app --reload

API available at http://localhost:8000.

Docker

docker build -t curriculum-engine .
docker run -p 8000:8000 --env-file .env curriculum-engine

Environment variables

Variable Required Purpose
ANTHROPIC_API_KEY Yes Claude API access
YOUTUBE_API_KEY No Video resource lookup
SERPER_API_KEY No Article/web resource lookup

API

POST /plan

curl -X POST http://localhost:8000/plan \
  -H "Content-Type: application/json" \
  -d '{"subject": "Linear algebra", "hours": 20}'

Response:

{
  "subject": "Linear algebra",
  "total_hours": 20,
  "overview": "A structured path from vector fundamentals to eigendecomposition...",
  "phases": [
    {
      "phase": 1,
      "title": "Foundations",
      "hours": 4,
      "description": "Vectors, dot products, matrix operations",
      "milestone": "Able to multiply matrices and interpret geometric meaning",
      "resources": [
        {
          "title": "Essence of Linear Algebra",
          "resource_type": "video",
          "estimated_minutes": 60,
          "url": "https://www.youtube.com/watch?v=fNk_zzaMoSs",
          "retrieved_title": "Essence of linear algebra - 3Blue1Brown",
          "channel": "3Blue1Brown"
        }
      ]
    }
  ]
}

GET /health

{ "status": "ok" }

Run tests

pytest
pytest tests/test_planner.py -v
pytest tests/test_retrieval.py -v

18 tests across planner and retrieval. Uses unittest.mock and pytest-asyncio. No live API calls made during the test suite.

Project structure

api.py                  # FastAPI app, endpoints, lifespan
curriculum_engine/
├── models.py           # Pydantic models: LearningPlan, Phase, Resource
├── planner.py          # Claude-based curriculum generation (prompt-cached)
└── retrieval.py        # Async resource enrichment via YouTube & Serper
tests/
├── test_planner.py     # Curriculum generation tests
└── test_retrieval.py   # Resource enrichment and URL validation tests

About

Hybrid RAG pipeline: Claude Haiku generates a structured learning plan with search queries; live YouTube Data API + Serper.dev retrieve and validate every resource link. Prompt caching cuts input token costs by up to 90%.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors