Curriculum Engine

Hybrid RAG pipeline that generates structured learning plans with verified resource links. Claude produces the curriculum structure and search queries — never URLs. Live APIs resolve every query to a real, validated link.

Built with Python, FastAPI, and the Anthropic Claude API with prompt caching.

The problem it solves

LLMs hallucinate URLs. A naive approach asks Claude to recommend resources and returns whatever links it generates — most of which are broken or fabricated. Curriculum Engine separates concerns: Claude handles curriculum design (what to learn, in what order, for how long); real APIs handle resource discovery (what actually exists and works).

Architecture

POST /plan
      │
      ▼
┌─────────────────────────────────────────┐
│  planner.py                             │
│  Claude Haiku (prompt-cached)           │
│  → structured curriculum               │
│  → search queries per resource          │
│     (no URLs — hallucination prevented) │
└──────────────┬──────────────────────────┘
               │ search queries
               ▼
┌─────────────────────────────────────────┐
│  retrieval.py  (asyncio.gather)         │
├────────────────────┬────────────────────┤
│  YouTube Data API  │  Serper.dev API    │
│  (video resources) │  (articles & web)  │
└────────────────────┴──────┬─────────────┘
                            │ real URLs
                            ▼
                     URL validation
                     HEAD check → swap 404s
                     → fallback to next result
                            │
                            ▼
                     LearningPlan response

Key technical decisions

No hallucinated URLs Claude is instructed to output only search queries, never URLs. Each query is resolved against a live API (YouTube Data API v3 or Serper.dev) to retrieve a real, current link. This makes the output reliably usable rather than plausible-looking but broken.

Prompt caching on the system prompt The fixed system prompt is cached using Anthropic's cache_control: ephemeral. On repeated calls (warm cache), input token costs drop by up to 90%. In a multi-user scenario this matters significantly — the system prompt is large and reused on every request.

Parallel async resource retrieval All resource lookups for a plan run concurrently via asyncio.gather. A 5-phase plan with 4 resources each (20 API calls) resolves in parallel rather than sequentially — response time scales with the slowest single call, not the total number of calls.

URL validation with automatic fallback Every retrieved URL is HEAD-checked before being included in the response. A 404 or timeout triggers a fallback to the next search result. The service degrades gracefully when optional API keys are absent — resources are returned without enriched URLs rather than failing.

Quick start

pip install -r requirements.txt
cp .env.example .env   # fill in your API keys
uvicorn api:app --reload

API available at http://localhost:8000.

Docker

docker build -t curriculum-engine .
docker run -p 8000:8000 --env-file .env curriculum-engine

Environment variables

Variable	Required	Purpose
`ANTHROPIC_API_KEY`	Yes	Claude API access
`YOUTUBE_API_KEY`	No	Video resource lookup
`SERPER_API_KEY`	No	Article/web resource lookup

API

`POST /plan`

curl -X POST http://localhost:8000/plan \
  -H "Content-Type: application/json" \
  -d '{"subject": "Linear algebra", "hours": 20}'

Response:

{
  "subject": "Linear algebra",
  "total_hours": 20,
  "overview": "A structured path from vector fundamentals to eigendecomposition...",
  "phases": [
    {
      "phase": 1,
      "title": "Foundations",
      "hours": 4,
      "description": "Vectors, dot products, matrix operations",
      "milestone": "Able to multiply matrices and interpret geometric meaning",
      "resources": [
        {
          "title": "Essence of Linear Algebra",
          "resource_type": "video",
          "estimated_minutes": 60,
          "url": "https://www.youtube.com/watch?v=fNk_zzaMoSs",
          "retrieved_title": "Essence of linear algebra - 3Blue1Brown",
          "channel": "3Blue1Brown"
        }
      ]
    }
  ]
}

`GET /health`

{ "status": "ok" }

Run tests

pytest
pytest tests/test_planner.py -v
pytest tests/test_retrieval.py -v

18 tests across planner and retrieval. Uses unittest.mock and pytest-asyncio. No live API calls made during the test suite.

Project structure

api.py                  # FastAPI app, endpoints, lifespan
curriculum_engine/
├── models.py           # Pydantic models: LearningPlan, Phase, Resource
├── planner.py          # Claude-based curriculum generation (prompt-cached)
└── retrieval.py        # Async resource enrichment via YouTube & Serper
tests/
├── test_planner.py     # Curriculum generation tests
└── test_retrieval.py   # Resource enrichment and URL validation tests

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
curriculum_engine		curriculum_engine
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api.py		api.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curriculum Engine

The problem it solves

Architecture

Key technical decisions

Quick start

Docker

Environment variables

API

`POST /plan`

`GET /health`

Run tests

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Curriculum Engine

The problem it solves

Architecture

Key technical decisions

Quick start

Docker

Environment variables

API

POST /plan

GET /health

Run tests

Project structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /plan`

`GET /health`

Packages