Skip to content

nithin-nk/ai-engineer-roadmap

Repository files navigation

Hands-On AI Engineer Roadmap

A structured 24-week program for software professionals who already know Python/Git.

Philosophy: 15% theory, 85% hands-on. Each week is designed to be completed in one week (2-3 hours/day). Every step is explicit enough to follow along without prior AI experience.

Phase 1: Talk to LLMs (Weeks 1-6)

Your first AI apps. By Week 6 you'll have a deployed chat app with streaming.

Week 1: FastAPI Crash Course

Topic: Building REST APIs with FastAPI (foundation for all AI apps) Project: Personal Expense Tracker API Key Skills: REST endpoints, Pydantic validation, JSON persistence Theory: 1 hour | Build: 7+ hours

  • Build CRUD endpoints from scratch
  • Learn request validation with Pydantic
  • Implement file-based persistence
  • Use Swagger UI for testing
  • Handle errors and edge cases

Week 2: Your First LLM API Call

Topic: Talking to LLMs through their APIs Project: Movie Recommendation Chatbot (CLI) Key Skills: API authentication, conversation history, token tracking Theory: 1 hour | Build: 7+ hours

  • Make your first OpenAI/Anthropic API call
  • Maintain conversation history
  • Track tokens and estimate costs
  • Save/load conversations
  • Experiment with temperature effects

Week 3: Prompt Engineering That Actually Works

Topic: Writing prompts that produce reliable, consistent outputs Project: Email Writer (tone & audience adaptation) Key Skills: Few-shot learning, chain-of-thought, prompt comparison Theory: 1 hour | Build: 7+ hours

  • Write specific, actionable prompts
  • Use examples to improve quality
  • Chain-of-thought reasoning
  • Compare multiple prompt strategies
  • Build a rewrite tool with tone/audience control

Week 4: Structured Outputs

Topic: Getting reliable JSON from LLMs Project: Job Posting Parser Key Skills: Pydantic validation, Instructor library, structured generation Theory: 1 hour | Build: 7+ hours

  • Define Pydantic models for data validation
  • Use OpenAI JSON mode
  • Automatic retries with Instructor
  • Handle missing fields gracefully
  • Batch process documents to CSV

Week 5: Tool Calling

Topic: Function/tool calling (the LLM decides which function to call) Project: Personal Assistant with Real Tools (stocks, weather, notes) Key Skills: Tool-calling loop, parallel tool execution, logging Theory: 1 hour | Build: 7+ hours

  • Define tools as JSON schemas
  • Implement the tool-calling loop
  • Handle multi-tool queries
  • Add error handling and logging
  • Understand how ChatGPT plugins work

Week 6: Streaming + Web UI

Topic: Streaming responses + building a chat UI with Streamlit Project: Full AI Chat App (deployed to Streamlit Cloud) Key Skills: Streaming APIs, Streamlit components, cloud deployment Theory: 1 hour | Build: 7+ hours

  • Implement streaming responses (tokens arrive live)
  • Build chat interface with Streamlit
  • Add controls: model selector, temperature slider, system prompt editor
  • Track token usage and costs
  • Deploy to Streamlit Cloud (free)

Phase 2: RAG (Weeks 7-11)

The #1 skill companies hire for. Build systems that let LLMs answer from YOUR documents.

Week 7: Embeddings + Semantic Search

Topic: Convert text to vectors, search by meaning Project: Semantic Search Engine for Notes Key Skills: Embeddings, cosine similarity, t-SNE visualization Theory: 1 hour | Build: 7+ hours

  • Understand embeddings conceptually and mathematically
  • Implement cosine similarity from scratch
  • Build semantic search engine
  • Visualize embeddings with t-SNE
  • Compare model speed vs quality tradeoffs

Week 8: Chunking + Vector Database

Topic: RAG fundamentals (Retrieval-Augmented Generation) Project: PDF Q&A System with Citations Key Skills: Document chunking, ChromaDB, retrieval, grounding LLM answers Theory: 1 hour | Build: 7+ hours

  • Extract and chunk PDFs intelligently
  • Store vectors in ChromaDB
  • Retrieve relevant chunks for queries
  • Feed context to LLM with citations
  • Build Streamlit UI and test failures

Week 9: RAG with LangChain

Topic: Build RAG the industry-standard way Project: Multi-Document Q&A with Source Citations Key Skills: LangChain document loaders, text splitters, retrieval chains Theory: 1 hour | Build: 7+ hours

  • Load PDFs, markdown, CSV with LangChain loaders
  • Use RecursiveCharacterTextSplitter
  • Build a RetrievalQA chain
  • Return source documents with answers
  • Handle multiple document types in one pipeline

Week 10: Advanced RAG + Evaluation

Topic: Fix RAG failures — reranking, query rewriting, evaluation Project: Evaluate and Improve Your RAG System Key Skills: RAGAS metrics, reranking, query decomposition, hallucination prevention Theory: 1 hour | Build: 7+ hours

  • Create a golden test set (20 question-answer pairs)
  • Run automated evaluation with RAGAS
  • Add reranking with cross-encoders
  • Implement query decomposition for multi-hop questions
  • Measure before/after improvement

Week 11: Production RAG App

Topic: Deploy a production-ready RAG system Project: Full RAG App with FastAPI + Qdrant + Docker Key Skills: Qdrant, FastAPI backend, incremental indexing, Docker Compose Theory: 1 hour | Build: 7+ hours

  • Switch from ChromaDB to Qdrant (production vector DB)
  • Build FastAPI backend with /ask and /upload endpoints
  • Add incremental document processing (only re-embed changed files)
  • Dockerize everything with docker-compose
  • One command starts the full stack

Phase 3: AI Agents (Weeks 12-16)

Make LLMs that don't just answer — they take action.

Week 12: Agent Loops from Scratch

Topic: Build an agent using just the API — no frameworks Project: ReAct Agent with 3 Tools Key Skills: ReAct pattern, tool execution loop, iteration limits Theory: 1 hour | Build: 7+ hours

  • Implement the ReAct loop (Reason → Act → Observe)
  • Give agent 3 tools (search, calculate, read file)
  • Handle tool execution errors
  • Add max iteration limits (prevent infinite loops)
  • Log every step for debugging

Week 13: LangGraph Agents

Topic: Build stateful agents with LangGraph Project: Web Research Agent Key Skills: State graphs, conditional edges, tool integration, Tavily search Theory: 1 hour | Build: 7+ hours

  • Define agent state with TypedDict
  • Build a state graph with conditional edges
  • Integrate Tavily search API
  • Add human-in-the-loop checkpoints
  • Handle failures gracefully

Week 14: Multi-Step Workflows

Topic: Chains and workflows — when you DON'T need agents Project: Content Repurposing Pipeline Key Skills: Prompt chaining, parallelization, routing, orchestration Theory: 1 hour | Build: 7+ hours

  • Build a 3-step pipeline: extract facts → generate content → score quality
  • Run parallel LLM calls (tweet + LinkedIn + summary at once)
  • Add routing: classify input, send to specialized handler
  • Compare workflow vs agent: speed, cost, reliability
  • Learn when NOT to use agents

Week 15: Multi-Agent Systems

Topic: Multiple agents collaborating on a task Project: Blog Writing Crew (Researcher → Writer → Editor) Key Skills: CrewAI, agent roles, task delegation, inter-agent communication Theory: 1 hour | Build: 7+ hours

  • Define specialized agents with roles and backstories
  • Create tasks with dependencies
  • Watch agents collaborate in verbose mode
  • Track cost per agent
  • Compare output quality: single agent vs crew

Week 16: Evaluation + Testing

Topic: Systematically test your AI systems Project: Test Suite for Your RAG and Agents Key Skills: DeepEval, golden test sets, LLM-as-judge, CI integration Theory: 1 hour | Build: 7+ hours

  • Create test cases for RAG (answerable, unanswerable, adversarial)
  • Create test cases for agents (correct tool selection, refusal, timeouts)
  • Build an evaluation harness that scores automatically
  • Set up GitHub Actions to run evals on every push
  • Track quality over time

Phase 4: Ship It (Weeks 17-20)

Make your AI apps survive real users, real traffic, and real failures.

Week 17: Docker + Deployment

Topic: Containerize and deploy your AI apps Project: Dockerize Your RAG App Key Skills: Dockerfiles, Docker Compose, multi-service apps, health checks Theory: 1 hour | Build: 7+ hours

  • Write Dockerfiles for FastAPI and Streamlit
  • Create docker-compose.yml with backend + frontend + vector DB
  • Add .dockerignore and environment variable management
  • Add health check endpoints
  • docker compose up starts everything

Week 18: Auth + Security

Topic: API authentication, rate limiting, prompt injection defense Project: Secure Your Deployed RAG App Key Skills: JWT tokens, API keys, rate limiting, input validation Theory: 1 hour | Build: 7+ hours

  • Add API key authentication to endpoints
  • Implement rate limiting (10 requests/minute per key)
  • Add prompt injection defenses
  • Test attacks against your app
  • Add request logging and CORS

Week 19: Monitoring + Observability

Topic: Trace every LLM call, log structured data Project: Add Full Observability with Langfuse Key Skills: Langfuse tracing, structlog, metrics, alerting Theory: 1 hour | Build: 7+ hours

  • Add structured JSON logging with structlog
  • Instrument RAG pipeline with Langfuse traces
  • Track latency, tokens, cost per request
  • Build a simple monitoring dashboard
  • Add alerts for error rate spikes

Week 20: Cost Control + Caching

Topic: Make AI apps affordable at scale Project: Add Redis Caching + Cost Tracking Key Skills: Redis, response caching, model routing, spend limits Theory: 1 hour | Build: 7+ hours

  • Add Redis to docker-compose
  • Implement exact-match response caching with TTL
  • Build a cost tracking endpoint
  • Route easy questions to cheap models, hard ones to expensive models
  • Measure real savings: 100 queries with/without cache

Phase 5: Level Up (Weeks 21-24)

Open-source models, fine-tuning, and your capstone.

Week 21: Open Source Models Locally

Topic: Run LLMs locally with zero API cost Project: Run Llama/Mistral Locally + Swap Into Your RAG App Key Skills: Ollama, model comparison, local inference Theory: 1 hour | Build: 7+ hours

  • Install Ollama, pull and chat with Llama 3.1
  • Use from Python — swap into your RAG app (one line change)
  • Benchmark 4 models: accuracy, speed, quality
  • Create a comparison table in README
  • Understand when local vs API makes sense

Week 22: Model Serving with vLLM

Topic: Production-speed model serving Project: High-Performance Model Server + Benchmarks Key Skills: vLLM, OpenAI-compatible serving, concurrent benchmarking, quantization Theory: 1 hour | Build: 7+ hours

  • Start vLLM as an OpenAI-compatible server
  • Benchmark: 50 concurrent requests, measure p50/p95/p99 latency
  • Compare Ollama vs vLLM throughput
  • Try quantized models (AWQ) — speed vs quality tradeoff
  • Add vLLM to your docker-compose stack

Week 23: Fine-Tuning

Topic: Teach a model your specific task Project: Fine-Tune a Text-to-SQL Model with Unsloth Key Skills: Dataset preparation, LoRA/QLoRA, training, evaluation, GGUF export Theory: 1 hour | Build: 7+ hours

  • Prepare 200+ training examples (natural language → SQL)
  • Fine-tune Llama 3.2 3B with QLoRA on Google Colab
  • Evaluate: base model vs fine-tuned on held-out test set
  • Export to GGUF, load in Ollama
  • Your custom model runs locally for free

Week 24: Capstone Project

Topic: End-to-end AI product using everything you've learned Project: Pick one and build it from scratch Key Skills: System design, architecture, full-stack AI development Theory: None. Build.

Option A — AI Code Documentation Generator: Point at a repo, auto-generate docs, searchable Q&A Option B — AI Data Analysis Assistant: Upload CSV, ask questions in English, get charts + reports Option C — Domain Knowledge Base: RAG + fine-tuned model + evaluation + monitoring for your domain

Requirements: Docker Compose, architecture diagram, eval results, monitoring, CI/CD, cost analysis.


Learning Outcomes

After 24 weeks, you'll be able to:

  • Build LLM apps with APIs, structured outputs, and tool calling
  • Build RAG systems that answer from your documents with citations
  • Build AI agents that take multi-step actions autonomously
  • Evaluate AI systems with automated test suites and quality metrics
  • Deploy everything with Docker, monitoring, auth, and caching
  • Run open-source models locally and serve them at production speed
  • Fine-tune models for your specific use case
  • Ship a complete AI product end-to-end

Tech Stack

Category Tools
Backend FastAPI, uvicorn
LLM APIs OpenAI, Anthropic (Claude)
Validation Pydantic, Instructor
Web UI Streamlit
Embeddings sentence-transformers
Vector DBs ChromaDB (dev), Qdrant (prod)
RAG Frameworks LangChain, LlamaIndex
Agents LangGraph, CrewAI
Evaluation RAGAS, DeepEval
Local Models Ollama, vLLM
Fine-Tuning Unsloth, TRL
Deployment Docker, Docker Compose
Monitoring Langfuse, structlog
Caching Redis
CI/CD GitHub Actions

Prerequisites

  • Solid Python knowledge (classes, functions, async is helpful)
  • Git basics (clone, commit, push)
  • Terminal/CLI comfort
  • ~2-3 hours per day for 24 weeks

File Structure

AI Engineer Roadmap/
├── README.md
├── week-01/idea.md    FastAPI Crash Course
├── week-02/idea.md    First LLM API Call
├── week-03/idea.md    Prompt Engineering
├── week-04/idea.md    Structured Outputs
├── week-05/idea.md    Tool Calling
├── week-06/idea.md    Streaming + Web UI
├── week-07/idea.md    Embeddings + Semantic Search
├── week-08/idea.md    Chunking + Vector DB (RAG)
├── week-09/idea.md    RAG with LangChain
├── week-10/idea.md    Advanced RAG + Evaluation
├── week-11/idea.md    Production RAG App
├── week-12/idea.md    Agent Loops from Scratch
├── week-13/idea.md    LangGraph Agents
├── week-14/idea.md    Multi-Step Workflows
├── week-15/idea.md    Multi-Agent Systems (CrewAI)
├── week-16/idea.md    Evaluation + Testing
├── week-17/idea.md    Docker + Deployment
├── week-18/idea.md    Auth + Security
├── week-19/idea.md    Monitoring + Observability
├── week-20/idea.md    Cost Control + Caching
├── week-21/idea.md    Open Source Models (Ollama)
├── week-22/idea.md    Model Serving (vLLM)
├── week-23/idea.md    Fine-Tuning (Unsloth)
└── week-24/idea.md    Capstone Project

Each idea.md contains: theory links, library setup, step-by-step project, working code snippets, common mistakes, and GitHub push guide.

How to Use This

  1. Clone this repo
  2. Start with Week 1 — read the theory, follow the project steps
  3. Type the code (don't just copy-paste)
  4. Push each week's project to GitHub
  5. Move forward — each week builds on previous ones
  6. Start applying for jobs after Week 16

24 weeks. 24 projects. Build something every week. That's it.

About

Become an AI Engineer in 24 weeks - hands-on, project-based roadmap for software professionals. No theory hell. Build real AI apps from week 1.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors