GenAI → Agentic AI — Hands-On Learning Roadmap

A project-based roadmap for experienced backend developers transitioning into AI engineering. Built with Node.js and the official Google Gen AI SDK — no Python, no tutorials, no theory-first approach.

Each phase produces a real, working project. Concepts compound from phase to phase. By Phase 9 you have a portfolio that demonstrates the full stack of modern AI engineering — from basic LLM calls to MCP servers, persistent agent memory, and production-grade RAG evaluation.

Philosophy

Build first. Understand by doing. No frameworks until they earn their place.

Every phase = one real project, 2–5 days to build
Raw SDK calls before frameworks — you see the mechanics, not the abstraction
Each project is independently runnable and portfolio-ready
Mistakes and fixes documented — the learning is in the debugging

The roadmap

#	Project	Core concept	Status
01	Smart changelog generator	Prompting, structured output, streaming	✅ Complete
02	Code review bot	Prompt chaining, schema-first design	✅ Complete
03	Docs Q&A API	Embeddings, vector search, grounding	✅ Complete
04	GitHub issue triage agent	ReAct loop, function calling	✅ Complete
05	Production AI hardening	Caching, evals, observability, cost	✅ Complete
06	Persistent research assistant	Agent memory, contextual retrieval	✅ Complete
07	RAG eval harness	LLM-as-judge, RAGAS metrics	✅ Complete
08	Fine-tuning comparison	When to fine-tune, ROI, platform limits	✅ Complete
09	Custom MCP server	Model Context Protocol, stdio transport	✅ Complete
10	Agentic RAG	Agent-driven retrieval, native JSON mode	🔜 Next
11	Multi-provider + LangChain	Vercel AI SDK, provider tradeoffs	🔜 Planned
12	Local models — Ollama	Open-source, offline, $0 cost	🔜 Planned
13	Multi-agent systems	Orchestrator + subagents, parallel execution	🔜 Planned
14	Browser agents + long-context	Computer use, 1M token context tradeoffs	🔜 Planned

Repository structure

genai-roadmap/
├── README.md                          ← you are here
│
├── 01-changelog-gen/                  ← Smart changelog generator
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── commits.js
│   ├── prompts.js
│   ├── parser.js
│   ├── renderer.js
│   ├── git.js
│   └── index.js
│
├── 02-code-reviewer/                  ← Code review bot
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── schema.js
│   ├── validator.js
│   ├── prompts.js
│   ├── reviewer.js
│   ├── renderer.js
│   ├── index.js
│   └── samples/
│       ├── good.js
│       └── bad.js
│
├── 03-docs-qa/                        ← Docs Q&A with RAG
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── db.js
│   ├── chunker.js
│   ├── embedder.js
│   ├── pdf-loader.js
│   ├── ingest.js
│   ├── retriever.js
│   ├── generator.js
│   ├── query.js
│   ├── index.js
│   └── docs/
│       ├── gemini-quickstart.md
│       ├── gemini-embeddings.md
│       └── gemini-models.md
│
├── 04-issue-triage/                   ← GitHub triage agent
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── github.js
│   ├── tools.js
│   ├── executor.js
│   ├── agent.js
│   └── index.js
│
├── 05-production/                     ← Production hardening
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── rateLimiter.js
│   ├── logger.js
│   ├── promptRegistry.js
│   ├── fallback.js
│   ├── cache.js
│   ├── tokens.js
│   ├── pipeline.js
│   ├── db.js
│   ├── retriever.js
│   ├── index.js
│   ├── logs/
│   │   └── .gitkeep
│   └── evals/
│       ├── runner.js
│       └── cases.js
│
├── 06-research-assistant/             ← Persistent research assistant
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── db.js
│   ├── memory/
│   │   ├── shortTerm.js
│   │   ├── longTerm.js
│   │   └── manager.js
│   ├── rag/
│   │   ├── chunker.js
│   │   ├── embedder.js
│   │   ├── ingest.js
│   │   └── retriever.js
│   ├── agent/
│   │   ├── prompts.js
│   │   └── assistant.js
│   ├── scripts/
│   │   └── create-docs.js
│   ├── docs/
│   └── index.js
│
├── 07-rag-evals/                      ← RAG eval harness
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── db.js
│   ├── retriever.js
│   ├── generator.js
│   ├── judge.js
│   ├── evalCases.js
│   └── runner.js
│
├── 08-finetuning/                     ← Fine-tuning comparison
│   ├── README.md
│   ├── .env.example
│   ├── client.js
│   ├── utils.js
│   ├── data/
│   │   ├── generate-training-data.js
│   │   ├── training.jsonl
│   │   └── validation.jsonl
│   ├── tune.js
│   └── compare.js
│
└── 09-mcp-server/                     ← Custom MCP server
    ├── README.md
    ├── .env.example
    ├── client.js
    ├── utils.js
    ├── db.js
    ├── rag/
    │   └── embedder.js
    ├── memory/
    │   └── longTerm.js
    ├── tools/
    │   ├── rag.js
    │   ├── memory.js
    │   └── github.js
    ├── server.js
    └── index.js

Each directory is independently runnable. Shared utilities (client.js, utils.js) are duplicated by design — no cross-phase imports, no monorepo tooling required.

Shared foundations

These two files appear in every phase. Copy them when starting a new one:

client.js — GoogleGenAI initialisation:

import { GoogleGenAI } from "@google/genai";
import dotenv from "dotenv";
dotenv.config();

export const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

utils.js — retry with exponential backoff + jitter:

export async function withRetry(fn, retries = 3, baseDelayMs = 1000) {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      const isLast = attempt === retries - 1;
      if (isLast) throw err;
      const retryable = err?.status === 429 || err?.status === 503;
      if (!retryable) throw err;
      const delay = baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000;
      console.warn(`Attempt ${attempt + 1} failed [${err.status}], retrying in ${Math.round(delay)}ms...`);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

01 — Smart Changelog Generator

📁 01-changelog-gen/ · Full README

Transforms raw git log output into a structured, categorised changelog using Gemini. Output is both JSON and Markdown.

What you learn: Prompt anatomy, dynamic value injection, defensive output parsing, streaming, retry with backoff.

The critical lesson: LLMs are text-in, text-out. Prompt quality directly determines output quality — what examples you show, what rules you number, what you explicitly forbid.

cd 01-changelog-gen && npm install
node index.js
# Output: CHANGELOG.md

02 — Code Review Bot

📁 02-code-reviewer/ · Full README

Runs a thorough code review on any source file — bugs, security vulnerabilities, performance issues, maintainability — with a concrete fix for each.

What you learn: Prompt chaining (2-step pipeline), schema-first design, output schema validation, input validation, few-shot prompting.

The critical lesson: Treat LLM output as untrusted external data. Silent schema drift breaks downstream code without throwing an error.

cd 02-code-reviewer && npm install
node index.js samples/bad.js     # score: ~15/100, 7 issues
node index.js samples/good.js    # score: ~90/100, minimal issues

03 — Docs Q&A API (RAG)

📁 03-docs-qa/ · Full README

Answers natural language questions grounded strictly in your documents — Gemini embeddings + pgvector + source citations on every answer.

What you learn: Embeddings, pgvector + HNSW index, paragraph-aware chunking, ingestion vs query pipeline separation, grounded generation, similarity thresholds.

The critical lesson: Chunking is the hardest part of RAG. The same embedding model must be used at ingest time and query time — mixing models produces wrong results silently.

cd 03-docs-qa && npm install
node ingest.js
node index.js "How do I use streaming?"
node index.js "What is the capital of France?"   # → "I don't have info..."

Requires: PostgreSQL + pgvector (Docker).

04 — GitHub Issue Triage Agent

📁 04-issue-triage/ · Full README

Triages GitHub issues autonomously — reads issues, finds duplicates, applies labels, posts comments, closes duplicates without human input.

What you learn: ReAct loop, function calling, tool executor pattern, conversation history as memory, iteration cap guardrail, temperature 0 for determinism.

The critical lesson: The model never calls GitHub. It says "I want to call search_issues." You call GitHub. You tell the model what came back. This mechanical separation is the foundation of every agent framework ever built.

cd 04-issue-triage && npm install
node index.js 4     # duplicate detection
node index.js 7     # standard labelling + comment

Requires: GitHub fine-grained PAT with Issues read/write.

05 — Production AI Hardening

📁 05-production/ · Full README

Hardens the Phase 3 RAG pipeline for production — semantic caching, automated evals, structured logging, fallback chains, cost tracking, prompt versioning.

What you learn: Semantic caching (Redis + embeddings), eval framework with CI exit codes, JSONL structured logging, Flash → Pro fallback chain, prompt versioning registry, ai.models.countTokens().

The critical lesson: Logging is the first thing to build, not the last. The free tier's 20 RPD cap is a hard wall for pipeline work — enable billing early.

cd 05-production && npm install
node index.js "How do I use streaming?"
node index.js eval

Requires: Redis (Docker), PostgreSQL + pgvector, Gemini billing enabled.

06 — Persistent Research Assistant

📁 06-research-assistant/ · Full README

A CLI research assistant that remembers your preferences, conclusions, and sources across sessions using short-term (in-memory) and long-term (pgvector) memory, with contextual retrieval for 49% fewer retrieval failures.

What you learn: Short-term vs long-term memory architecture, importance-weighted memory recall, recent memory fallback, contextual retrieval (Anthropic's technique).

The critical lesson: Long-term agent memory is RAG applied to the agent's own history. Same embeddings, same pgvector, same cosine similarity — different content.

cd 06-research-assistant && npm install
node scripts/create-docs.js && node rag/ingest.js
node index.js yourname    # Session 1
node index.js yourname    # Session 2 — picks up memories

Requires: PostgreSQL + pgvector with memories table.

07 — RAG Eval Harness

📁 07-rag-evals/ · Full README

Automated quality evaluation for the 06 RAG pipeline — four RAGAS-aligned metrics scored by an LLM judge using native JSON mode.

What you learn: LLM-as-judge pattern, faithfulness / relevance / precision / recall metrics, native JSON mode, adversarial grounding tests, parallel metric scoring with Promise.all().

The critical lesson: Low faithfulness = generator hallucinating. Low precision = retriever returning noise. Low recall = docs don't cover the topic. Each metric points to a different fix.

cd 07-rag-evals && npm install
node runner.js    # exits 0 if ≥ 0.70, exits 1 if below — CI-ready

08 — Fine-Tuning vs RAG vs Prompting

📁 08-finetuning/ · Full README

Head-to-head comparison of zero-shot vs few-shot approaches on changelog generation. Includes training data prep, ROI calculation, and documented platform limitation.

What you learn: Fine-tuning decision tree, JSONL training data format, token cost delta between approaches, fine-tuning ROI at scale.

Platform note: Gemini Developer API dropped fine-tuning support mid-2025. Zero-shot vs few-shot comparison runs fully; tuning job requires Vertex AI.

cd 08-finetuning && npm install
node data/generate-training-data.js
node compare.js

09 — Custom MCP Server

📁 09-mcp-server/ · Full README

Exposes the 06 RAG pipeline, agent memory, and 04 GitHub tools as a standardised MCP server — connectable to Claude Desktop or any MCP client without writing agent code.

What you learn: MCP tools vs resources vs prompts, Zod validation, stdio transport, tool descriptions as prompts, MCP Inspector, Claude Desktop integration.

The critical lesson: Write the MCP server once. Claude Desktop, Cursor, and your own agents all discover and use the same tools automatically.

cd 09-mcp-server && npm install
npx @modelcontextprotocol/inspector node index.js
# Then add to Claude Desktop config and restart

Requires: 06 database, GitHub PAT, Claude Desktop.

10 — Agentic RAG (next)

📁 10-agentic-rag/ · Coming soon

The agent decides when and how to retrieve — not just at query time. Native JSON schema enforcement, query rewriting, multi-hop retrieval, self-correcting retrieval loops, hybrid search.

11 — Multi-Provider + LangChain (planned)

📁 11-multi-provider/ · Coming soon

Same code reviewer from 02, rebuilt with three providers (OpenAI, Claude, Gemini) via Vercel AI SDK and LangChain. Measure quality and cost tradeoffs. First intentional use of frameworks.

12 — Local Models — Ollama (planned)

📁 12-local-models/ · Coming soon

Offline-capable RAG using Ollama + Llama/Mistral. Same 03 pipeline, zero API cost, runs entirely on your machine.

13 — Multi-Agent Systems (planned)

📁 13-multi-agent/ · Coming soon

Orchestrator spawns specialist subagents in parallel. Real coordination, handoffs, shared memory, partial failure handling.

14 — Browser Agents + Long-Context (planned)

📁 14-browser-agents/ · Coming soon

Browser agent using Playwright. Explores the 1M token context vs RAG tradeoff — when does full-context beat retrieval?

Tech stack (all phases)

Tool	Role	Notes
`@google/genai`	Gemini SDK	Official SDK — replaces deprecated `@google/generative-ai`
`gemini-2.5-flash`	Generation	Fast, 1M context, best default
`gemini-2.5-flash-lite`	Judge / eval model	$0.10/M tokens — cheapest stable option
`gemini-embedding-001`	Embeddings	Replaces deprecated `text-embedding-004` (Jan 2026), 1536 dims
PostgreSQL + pgvector	Vector store	HNSW index, cosine similarity
Redis	Semantic cache	Phase 05
`@octokit/rest`	GitHub API	Phases 04 + 09
`@modelcontextprotocol/sdk`	MCP server	Phase 09
`zod`	Schema validation	Phase 09 tool parameters
No LangChain (Phases 01–09)	—	Raw SDK first — frameworks introduced in Phase 11

Real-world issues hit and fixed

Every entry below is something that actually broke during this build:

Issue	Phase	Root cause	Fix
`@google/generative-ai` import fails	01–04	Old SDK deprecated	Migrated to `@google/genai`
JSON truncated mid-response	01	`maxOutputTokens: 2048` too low	Raised to `8192`
Wrong date in changelog	01	Model hallucinated from training data	Injected `new Date().toISOString()`
All bug fixes merged into one entry	01	Prompt too vague	Added rule: one entry per commit
`text-embedding-004` deprecated	03	Model retired Jan 2026	Migrated to `gemini-embedding-001`
HNSW index fails on 3072 dims	03	pgvector caps HNSW at 2000 dims	Used `outputDimensionality: 1536`
Agent skips `search_issues` on obvious duplicate	04	Model reads issue body and reasons correctly	Expected — let it reason
Free tier 20 RPD wall	05	Google cut free tier 92% Dec 2025	Enable billing (Tier 1)
`retry in 11s` misleading on daily quota error	05	429 = daily cap, not per-minute	Detect `GenerateRequestsPerDayPerProjectPerModel` — wait for midnight or enable billing
Agent says "no memories from past sessions"	06	Model not emitting `[REMEMBER:]` signals	Made instruction CRITICAL + MUST in system prompt
Meta-questions return no memories	06	Semantic similarity too low for "what did we discuss?"	Added recent memory fallback — always loads last 3
DBeaver can't display vector column rows	06	DBeaver doesn't render `vector` type	Query without embedding column
`ai.tunings.create is not a function`	08	Tuning not in Gemini Developer API JS SDK	Use REST API or Vertex AI
Fine-tuning REST API 400 error	08	Gemini Developer API dropped tuning mid-2025	Concept documented; requires Vertex AI for execution

Getting started

git clone https://github.com/your-username/genai-roadmap
cd genai-roadmap/01-changelog-gen
npm install
cp .env.example .env   # add GEMINI_API_KEY
node index.js

Get a free Gemini API key at aistudio.google.com. Enable billing before Phase 05 — the free tier (20 RPD) is exhausted in minutes by pipeline work.

API keys needed across the full roadmap:

Phase	Provider	Where
01–09	Gemini (billing enabled)	aistudio.google.com
04, 09	GitHub PAT (fine-grained)	GitHub → Settings → Developer settings
11	OpenAI	platform.openai.com — $5 min topup
11	Anthropic	console.anthropic.com — $5 free credits
12	None (Ollama local)	ollama.ai — free

About

Built by a fullstack developer with 8 years of experience in Node.js, Express, Angular, and healthcare systems — transitioning into AI engineering by building, not watching tutorials.

The goal: go from "I've heard of RAG" to "I've built and debugged a production-shaped RAG system with evals, memory, and an MCP server" in under 6 weeks. This repo is the evidence it worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI → Agentic AI — Hands-On Learning Roadmap

Philosophy

The roadmap

Repository structure

Shared foundations

01 — Smart Changelog Generator

02 — Code Review Bot

03 — Docs Q&A API (RAG)

04 — GitHub Issue Triage Agent

05 — Production AI Hardening

06 — Persistent Research Assistant

07 — RAG Eval Harness

08 — Fine-Tuning vs RAG vs Prompting

09 — Custom MCP Server

10 — Agentic RAG (next)

11 — Multi-Provider + LangChain (planned)

12 — Local Models — Ollama (planned)

13 — Multi-Agent Systems (planned)

14 — Browser Agents + Long-Context (planned)

Tech stack (all phases)

Real-world issues hit and fixed

Getting started

About

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
01-changelog-gen		01-changelog-gen
02-code-reviewer		02-code-reviewer
03-docs-qa		03-docs-qa
04-issue-triage		04-issue-triage
05-production-test-ai-hardening		05-production-test-ai-hardening
06-research-assistant		06-research-assistant
07-rag-eval		07-rag-eval
08-finetuning		08-finetuning
09-mcp-server		09-mcp-server
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

GenAI → Agentic AI — Hands-On Learning Roadmap

Philosophy

The roadmap

Repository structure

Shared foundations

01 — Smart Changelog Generator

02 — Code Review Bot

03 — Docs Q&A API (RAG)

04 — GitHub Issue Triage Agent

05 — Production AI Hardening

06 — Persistent Research Assistant

07 — RAG Eval Harness

08 — Fine-Tuning vs RAG vs Prompting

09 — Custom MCP Server

10 — Agentic RAG (next)

11 — Multi-Provider + LangChain (planned)

12 — Local Models — Ollama (planned)

13 — Multi-Agent Systems (planned)

14 — Browser Agents + Long-Context (planned)

Tech stack (all phases)

Real-world issues hit and fixed

Getting started

About

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages