Hands-On AI Engineer Roadmap

A structured 24-week program for software professionals who already know Python/Git.

Philosophy: 15% theory, 85% hands-on. Each week is designed to be completed in one week (2-3 hours/day). Every step is explicit enough to follow along without prior AI experience.

Phase 1: Talk to LLMs (Weeks 1-6)

Your first AI apps. By Week 6 you'll have a deployed chat app with streaming.

Week 1: FastAPI Crash Course

Topic: Building REST APIs with FastAPI (foundation for all AI apps) Project: Personal Expense Tracker API Key Skills: REST endpoints, Pydantic validation, JSON persistence Theory: 1 hour | Build: 7+ hours

Build CRUD endpoints from scratch
Learn request validation with Pydantic
Implement file-based persistence
Use Swagger UI for testing
Handle errors and edge cases

Week 2: Your First LLM API Call

Topic: Talking to LLMs through their APIs Project: Movie Recommendation Chatbot (CLI) Key Skills: API authentication, conversation history, token tracking Theory: 1 hour | Build: 7+ hours

Make your first OpenAI/Anthropic API call
Maintain conversation history
Track tokens and estimate costs
Save/load conversations
Experiment with temperature effects

Week 3: Prompt Engineering That Actually Works

Topic: Writing prompts that produce reliable, consistent outputs Project: Email Writer (tone & audience adaptation) Key Skills: Few-shot learning, chain-of-thought, prompt comparison Theory: 1 hour | Build: 7+ hours

Write specific, actionable prompts
Use examples to improve quality
Chain-of-thought reasoning
Compare multiple prompt strategies
Build a rewrite tool with tone/audience control

Week 4: Structured Outputs

Topic: Getting reliable JSON from LLMs Project: Job Posting Parser Key Skills: Pydantic validation, Instructor library, structured generation Theory: 1 hour | Build: 7+ hours

Define Pydantic models for data validation
Use OpenAI JSON mode
Automatic retries with Instructor
Handle missing fields gracefully
Batch process documents to CSV

Week 5: Tool Calling

Topic: Function/tool calling (the LLM decides which function to call) Project: Personal Assistant with Real Tools (stocks, weather, notes) Key Skills: Tool-calling loop, parallel tool execution, logging Theory: 1 hour | Build: 7+ hours

Define tools as JSON schemas
Implement the tool-calling loop
Handle multi-tool queries
Add error handling and logging
Understand how ChatGPT plugins work

Week 6: Streaming + Web UI

Topic: Streaming responses + building a chat UI with Streamlit Project: Full AI Chat App (deployed to Streamlit Cloud) Key Skills: Streaming APIs, Streamlit components, cloud deployment Theory: 1 hour | Build: 7+ hours

Implement streaming responses (tokens arrive live)
Build chat interface with Streamlit
Add controls: model selector, temperature slider, system prompt editor
Track token usage and costs
Deploy to Streamlit Cloud (free)

Phase 2: RAG (Weeks 7-11)

The #1 skill companies hire for. Build systems that let LLMs answer from YOUR documents.

Week 7: Embeddings + Semantic Search

Topic: Convert text to vectors, search by meaning Project: Semantic Search Engine for Notes Key Skills: Embeddings, cosine similarity, t-SNE visualization Theory: 1 hour | Build: 7+ hours

Understand embeddings conceptually and mathematically
Implement cosine similarity from scratch
Build semantic search engine
Visualize embeddings with t-SNE
Compare model speed vs quality tradeoffs

Week 8: Chunking + Vector Database

Topic: RAG fundamentals (Retrieval-Augmented Generation) Project: PDF Q&A System with Citations Key Skills: Document chunking, ChromaDB, retrieval, grounding LLM answers Theory: 1 hour | Build: 7+ hours

Extract and chunk PDFs intelligently
Store vectors in ChromaDB
Retrieve relevant chunks for queries
Feed context to LLM with citations
Build Streamlit UI and test failures

Week 9: RAG with LangChain

Topic: Build RAG the industry-standard way Project: Multi-Document Q&A with Source Citations Key Skills: LangChain document loaders, text splitters, retrieval chains Theory: 1 hour | Build: 7+ hours

Load PDFs, markdown, CSV with LangChain loaders
Use RecursiveCharacterTextSplitter
Build a RetrievalQA chain
Return source documents with answers
Handle multiple document types in one pipeline

Week 10: Advanced RAG + Evaluation

Topic: Fix RAG failures — reranking, query rewriting, evaluation Project: Evaluate and Improve Your RAG System Key Skills: RAGAS metrics, reranking, query decomposition, hallucination prevention Theory: 1 hour | Build: 7+ hours

Create a golden test set (20 question-answer pairs)
Run automated evaluation with RAGAS
Add reranking with cross-encoders
Implement query decomposition for multi-hop questions
Measure before/after improvement

Week 11: Production RAG App

Topic: Deploy a production-ready RAG system Project: Full RAG App with FastAPI + Qdrant + Docker Key Skills: Qdrant, FastAPI backend, incremental indexing, Docker Compose Theory: 1 hour | Build: 7+ hours

Switch from ChromaDB to Qdrant (production vector DB)
Build FastAPI backend with /ask and /upload endpoints
Add incremental document processing (only re-embed changed files)
Dockerize everything with docker-compose
One command starts the full stack

Phase 3: AI Agents (Weeks 12-16)

Make LLMs that don't just answer — they take action.

Week 12: Agent Loops from Scratch

Topic: Build an agent using just the API — no frameworks Project: ReAct Agent with 3 Tools Key Skills: ReAct pattern, tool execution loop, iteration limits Theory: 1 hour | Build: 7+ hours

Implement the ReAct loop (Reason → Act → Observe)
Give agent 3 tools (search, calculate, read file)
Handle tool execution errors
Add max iteration limits (prevent infinite loops)
Log every step for debugging

Week 13: LangGraph Agents

Topic: Build stateful agents with LangGraph Project: Web Research Agent Key Skills: State graphs, conditional edges, tool integration, Tavily search Theory: 1 hour | Build: 7+ hours

Define agent state with TypedDict
Build a state graph with conditional edges
Integrate Tavily search API
Add human-in-the-loop checkpoints
Handle failures gracefully

Week 14: Multi-Step Workflows

Topic: Chains and workflows — when you DON'T need agents Project: Content Repurposing Pipeline Key Skills: Prompt chaining, parallelization, routing, orchestration Theory: 1 hour | Build: 7+ hours

Build a 3-step pipeline: extract facts → generate content → score quality
Run parallel LLM calls (tweet + LinkedIn + summary at once)
Add routing: classify input, send to specialized handler
Compare workflow vs agent: speed, cost, reliability
Learn when NOT to use agents

Week 15: Multi-Agent Systems

Topic: Multiple agents collaborating on a task Project: Blog Writing Crew (Researcher → Writer → Editor) Key Skills: CrewAI, agent roles, task delegation, inter-agent communication Theory: 1 hour | Build: 7+ hours

Define specialized agents with roles and backstories
Create tasks with dependencies
Watch agents collaborate in verbose mode
Track cost per agent
Compare output quality: single agent vs crew

Week 16: Evaluation + Testing

Topic: Systematically test your AI systems Project: Test Suite for Your RAG and Agents Key Skills: DeepEval, golden test sets, LLM-as-judge, CI integration Theory: 1 hour | Build: 7+ hours

Create test cases for RAG (answerable, unanswerable, adversarial)
Create test cases for agents (correct tool selection, refusal, timeouts)
Build an evaluation harness that scores automatically
Set up GitHub Actions to run evals on every push
Track quality over time

Phase 4: Ship It (Weeks 17-20)

Make your AI apps survive real users, real traffic, and real failures.

Week 17: Docker + Deployment

Topic: Containerize and deploy your AI apps Project: Dockerize Your RAG App Key Skills: Dockerfiles, Docker Compose, multi-service apps, health checks Theory: 1 hour | Build: 7+ hours

Write Dockerfiles for FastAPI and Streamlit
Create docker-compose.yml with backend + frontend + vector DB
Add .dockerignore and environment variable management
Add health check endpoints
docker compose up starts everything

Week 18: Auth + Security

Topic: API authentication, rate limiting, prompt injection defense Project: Secure Your Deployed RAG App Key Skills: JWT tokens, API keys, rate limiting, input validation Theory: 1 hour | Build: 7+ hours

Add API key authentication to endpoints
Implement rate limiting (10 requests/minute per key)
Add prompt injection defenses
Test attacks against your app
Add request logging and CORS

Week 19: Monitoring + Observability

Topic: Trace every LLM call, log structured data Project: Add Full Observability with Langfuse Key Skills: Langfuse tracing, structlog, metrics, alerting Theory: 1 hour | Build: 7+ hours

Add structured JSON logging with structlog
Instrument RAG pipeline with Langfuse traces
Track latency, tokens, cost per request
Build a simple monitoring dashboard
Add alerts for error rate spikes

Week 20: Cost Control + Caching

Topic: Make AI apps affordable at scale Project: Add Redis Caching + Cost Tracking Key Skills: Redis, response caching, model routing, spend limits Theory: 1 hour | Build: 7+ hours

Add Redis to docker-compose
Implement exact-match response caching with TTL
Build a cost tracking endpoint
Route easy questions to cheap models, hard ones to expensive models
Measure real savings: 100 queries with/without cache

Phase 5: Level Up (Weeks 21-24)

Open-source models, fine-tuning, and your capstone.

Week 21: Open Source Models Locally

Topic: Run LLMs locally with zero API cost Project: Run Llama/Mistral Locally + Swap Into Your RAG App Key Skills: Ollama, model comparison, local inference Theory: 1 hour | Build: 7+ hours

Install Ollama, pull and chat with Llama 3.1
Use from Python — swap into your RAG app (one line change)
Benchmark 4 models: accuracy, speed, quality
Create a comparison table in README
Understand when local vs API makes sense

Week 22: Model Serving with vLLM

Topic: Production-speed model serving Project: High-Performance Model Server + Benchmarks Key Skills: vLLM, OpenAI-compatible serving, concurrent benchmarking, quantization Theory: 1 hour | Build: 7+ hours

Start vLLM as an OpenAI-compatible server
Benchmark: 50 concurrent requests, measure p50/p95/p99 latency
Compare Ollama vs vLLM throughput
Try quantized models (AWQ) — speed vs quality tradeoff
Add vLLM to your docker-compose stack

Week 23: Fine-Tuning

Topic: Teach a model your specific task Project: Fine-Tune a Text-to-SQL Model with Unsloth Key Skills: Dataset preparation, LoRA/QLoRA, training, evaluation, GGUF export Theory: 1 hour | Build: 7+ hours

Prepare 200+ training examples (natural language → SQL)
Fine-tune Llama 3.2 3B with QLoRA on Google Colab
Evaluate: base model vs fine-tuned on held-out test set
Export to GGUF, load in Ollama
Your custom model runs locally for free

Week 24: Capstone Project

Topic: End-to-end AI product using everything you've learned Project: Pick one and build it from scratch Key Skills: System design, architecture, full-stack AI development Theory: None. Build.

Option A — AI Code Documentation Generator: Point at a repo, auto-generate docs, searchable Q&A Option B — AI Data Analysis Assistant: Upload CSV, ask questions in English, get charts + reports Option C — Domain Knowledge Base: RAG + fine-tuned model + evaluation + monitoring for your domain

Requirements: Docker Compose, architecture diagram, eval results, monitoring, CI/CD, cost analysis.

Learning Outcomes

After 24 weeks, you'll be able to:

Build LLM apps with APIs, structured outputs, and tool calling
Build RAG systems that answer from your documents with citations
Build AI agents that take multi-step actions autonomously
Evaluate AI systems with automated test suites and quality metrics
Deploy everything with Docker, monitoring, auth, and caching
Run open-source models locally and serve them at production speed
Fine-tune models for your specific use case
Ship a complete AI product end-to-end

Tech Stack

Category	Tools
Backend	FastAPI, uvicorn
LLM APIs	OpenAI, Anthropic (Claude)
Validation	Pydantic, Instructor
Web UI	Streamlit
Embeddings	sentence-transformers
Vector DBs	ChromaDB (dev), Qdrant (prod)
RAG Frameworks	LangChain, LlamaIndex
Agents	LangGraph, CrewAI
Evaluation	RAGAS, DeepEval
Local Models	Ollama, vLLM
Fine-Tuning	Unsloth, TRL
Deployment	Docker, Docker Compose
Monitoring	Langfuse, structlog
Caching	Redis
CI/CD	GitHub Actions

Prerequisites

Solid Python knowledge (classes, functions, async is helpful)
Git basics (clone, commit, push)
Terminal/CLI comfort
~2-3 hours per day for 24 weeks

File Structure

AI Engineer Roadmap/
├── README.md
├── week-01/idea.md    FastAPI Crash Course
├── week-02/idea.md    First LLM API Call
├── week-03/idea.md    Prompt Engineering
├── week-04/idea.md    Structured Outputs
├── week-05/idea.md    Tool Calling
├── week-06/idea.md    Streaming + Web UI
├── week-07/idea.md    Embeddings + Semantic Search
├── week-08/idea.md    Chunking + Vector DB (RAG)
├── week-09/idea.md    RAG with LangChain
├── week-10/idea.md    Advanced RAG + Evaluation
├── week-11/idea.md    Production RAG App
├── week-12/idea.md    Agent Loops from Scratch
├── week-13/idea.md    LangGraph Agents
├── week-14/idea.md    Multi-Step Workflows
├── week-15/idea.md    Multi-Agent Systems (CrewAI)
├── week-16/idea.md    Evaluation + Testing
├── week-17/idea.md    Docker + Deployment
├── week-18/idea.md    Auth + Security
├── week-19/idea.md    Monitoring + Observability
├── week-20/idea.md    Cost Control + Caching
├── week-21/idea.md    Open Source Models (Ollama)
├── week-22/idea.md    Model Serving (vLLM)
├── week-23/idea.md    Fine-Tuning (Unsloth)
└── week-24/idea.md    Capstone Project

Each idea.md contains: theory links, library setup, step-by-step project, working code snippets, common mistakes, and GitHub push guide.

How to Use This

Clone this repo
Start with Week 1 — read the theory, follow the project steps
Type the code (don't just copy-paste)
Push each week's project to GitHub
Move forward — each week builds on previous ones
Start applying for jobs after Week 16

24 weeks. 24 projects. Build something every week. That's it.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
week-01		week-01
week-02		week-02
week-03		week-03
week-04		week-04
week-05		week-05
week-06		week-06
week-07		week-07
week-08		week-08
week-09		week-09
week-10		week-10
week-11		week-11
week-12		week-12
week-13		week-13
week-14		week-14
week-15		week-15
week-16		week-16
week-17		week-17
week-18		week-18
week-19		week-19
week-20		week-20
week-21		week-21
week-22		week-22
week-23		week-23
week-24		week-24
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Hands-On AI Engineer Roadmap

Phase 1: Talk to LLMs (Weeks 1-6)

Week 1: FastAPI Crash Course

Week 2: Your First LLM API Call

Week 3: Prompt Engineering That Actually Works

Week 4: Structured Outputs

Week 5: Tool Calling

Week 6: Streaming + Web UI

Phase 2: RAG (Weeks 7-11)

Week 7: Embeddings + Semantic Search

Week 8: Chunking + Vector Database

Week 9: RAG with LangChain

Week 10: Advanced RAG + Evaluation

Week 11: Production RAG App

Phase 3: AI Agents (Weeks 12-16)

Week 12: Agent Loops from Scratch

Week 13: LangGraph Agents

Week 14: Multi-Step Workflows

Week 15: Multi-Agent Systems

Week 16: Evaluation + Testing

Phase 4: Ship It (Weeks 17-20)

Week 17: Docker + Deployment

Week 18: Auth + Security

Week 19: Monitoring + Observability

Week 20: Cost Control + Caching

Phase 5: Level Up (Weeks 21-24)

Week 21: Open Source Models Locally

Week 22: Model Serving with vLLM

Week 23: Fine-Tuning

Week 24: Capstone Project

Learning Outcomes

Tech Stack

Prerequisites

File Structure

How to Use This

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages