"Medical students spend years in hospitals treating real patients before they're allowed to practice independently. Why do we let CS graduates walk into a production codebase on Day 1 with nothing but algorithms and syntax?"
CodeResidency is a simulation platform where computer science students face the real-world challenges of software engineering — not as homework, but as live, high-stakes-feeling simulations with no real-world consequences.
The medical analogy: Before a student doctor can see patients alone, they complete supervised clinical rotations — rotating through ER, surgery, psychiatry, and pediatrics to build reflexes that no textbook can teach. CodeResidency gives CS students the same experience:
- Rotating through The Difficult Client to learn requirements engineering
- Rotating through The Code Crucible to implement under real-world constraints
- Rotating through The Bug Hunt to build debugging reflexes
- Rotating through The War Room to handle production emergencies
- Rotating through The Imposter to practice git forensics on a sabotaged team
Each simulation is powered by AI that plays the role of the client, the CTO, the teammates — and the grader.
Institution: MuLearn SCET
Each phase can be played independently. An optional Full Residency Run chains all 5 into a cohesive 2-hour simulation where context flows between phases.
Skill: Requirements elicitation, client communication, empathy
The Situation: You've been hired as a freelance developer. Your client is vague, non-technical, and occasionally difficult. They know what they want — but can't articulate it in technical terms. Your job: figure out exactly what they need before writing a single line of code.
How it works:
- An AI client is generated with a hidden "Truth File" — 4–6 critical constraints the client holds (budget, deadline, technical requirements, accessibility needs)
- A Director AI (hidden) evaluates the quality of every question you ask. Rude or jargon-heavy questions get minimal information. Empathetic, clear questions unlock more
- An Actor AI (visible) plays the client in character — stays in persona, reveals only what the Director permits
- When 80%+ of constraints are discovered, a coding challenge unlocks
- Session ends with an AI-generated report card: letter grade, communication breakdown, and specific feedback
Example scenarios:
- "Angry Bakery Owner" — Needs a website but refuses to pay for hosting
- "Overwhelmed Non-Profit Director" — Wants an app with a $0 budget
- "Perfectionist Startup Founder" — Keeps changing requirements mid-conversation
What gets scored:
| Component | Weight |
|---|---|
| Constraints discovered | 40% |
| Question quality (empathy, clarity, no jargon) | 20% |
| Code correctness | 25% |
| Code quality | 15% |
Skill: Requirement-driven implementation, output validation, code quality
The Situation: The requirements gathering is done. Time to build it. But this isn't a practice problem — the task comes directly from what the client told you. If you gathered the wrong requirements in Phase 1, you might build the wrong thing.
How it works:
- A ChallengeGeneratorAgent reads your discovered requirements and generates a specific coding task: function signature, expected inputs/outputs, and hidden test cases
- You implement the solution in Monaco editor (Python/JavaScript/Java)
- "Run" executes your code in an isolated Docker sandbox and shows the output
- "Submit" runs all hidden test cases and scores against expected output
- An OutputValidatorAgent reviews both correctness and code quality
Example: After discovering the bakery owner needs offline-capable receipt generation with no paid libraries → challenge: "Write a function that generates a plain-text receipt from a list of items and prices."
What gets scored:
| Component | Weight |
|---|---|
| Test case pass rate | 50% |
| Code quality (naming, structure, readability) | 30% |
| Efficiency (time taken, solution brevity) | 20% |
Skill: Debugging, error reading, code correction under constraint
The Situation: Someone handed you a codebase. It's broken in exactly 4 ways — one syntax error, one logic error, one runtime error, one type error. The terminal is your only guide. Fix them all.
How it works:
- A BugInjectorAgent takes a correct solution and injects exactly 4 intentional bugs, one of each type — stored server-side in a secret bug manifest you never see
- The buggy code is pre-loaded into your Monaco editor
- Click "Run" → Docker runs it → you read the error messages
- Fix the errors, then "Submit" — a BugEvaluatorAgent checks which bugs were resolved
- Score is progressive: each bug fixed earns 25 pts regardless of order
- Up to 3 submissions. Need a nudge? Request a hint (reveals line number, costs -10 pts)
The four bug types:
| Type | Example |
|---|---|
| SyntaxError | Missing : after def, unclosed bracket |
| LogicError | totalAmount == 0 instead of = 0 |
| RuntimeError | Division by zero, index out of range |
| TypeError | Concatenating str + int without conversion |
What gets scored:
| Component | Weight |
|---|---|
| Per bug fixed | 25 pts each (×4) |
| Hint penalty | -10 pts per hint used |
| Efficiency bonus | +10 pts if all 4 fixed within 3 submissions |
Skill: Production incident response, root cause analysis, composure under pressure
The Situation: It's 2:47 AM. Your phone buzzes. Slack is on fire. All users are getting 404s. The CTO is messaging you. You have 15 minutes.
How it works:
- An emergency scenario is loaded: a pre-seeded broken environment with a specific fault
- A WebSocket countdown timer runs server-side — the server enforces the deadline regardless of client state
- An EmergencyDirectorAgent sends escalating pressure messages via a live CTO chat feed:
- 80% time left: "Getting some reports of weirdness, can you take a look?"
- 50% time left: "Users are hitting errors! Error rate is at 40%!"
- 10% time left: "WE ARE LOSING $500/MINUTE. FIX IT OR I'M ROLLING BACK THE RELEASE."
- You can run terminal commands, edit config files, and submit a root cause explanation
- The AI scores your diagnosis and fix quality after submission
Emergency types:
| Incident | Symptom |
|---|---|
| 404 all customers | Misconfigured route prefix |
| Database connection failure | Wrong connection string after deploy |
| Wrong deployment | Backwards-incompatible API change |
| Memory leak | Server crashing every 10 minutes |
| Security breach alert | Unauthorized admin access in logs |
What gets scored:
| Component | Weight |
|---|---|
| Correct root cause identified | 30% |
| Fix quality (proper fix vs. brute-force restart) | 20% |
| Time efficiency | 35% |
| Systematic approach | 15% |
Skill: Code review, git forensics, team debugging, identifying bad actors
The Situation: Your team of 4 just finished a feature. It worked in staging. Now it's broken in production. Someone introduced a bug — and they're blaming each other. You need to find the commit that broke everything and call out the imposter.
How it works:
- Three AI teammates (Alice, Bob, Carol) each have a distinct coding style and commit history
- An ImposterAgent secretly authors one "sabotage commit" — a change that looks legitimate but introduces a subtle bug
- You get access to a visual, interactive git commit graph — click any commit to see its full diff (syntax-highlighted, added/removed lines)
- Filter commits by author, trace the bug back to its source
- Submit your report: which teammate is the imposter + which commit caused the bug + how you'd resolve it
The visual git experience:
● a3f8b2c Alice Chen "feat: add payment processing"
● b91c44a Bob Martinez "fix: auth middleware null check"
● d44e1f9 Carol Wu "fix: update user schema" ← click to see diff
● 8ab901f You "feat: implement dashboard"
Click any node to see the diff. The bad commit's diff looks plausible. That's the point.
Resolution methods (scored differently):
| Method | Points |
|---|---|
git revert <commit> |
30 pts — clean, safe, industry standard |
| Manual fix | 5 pts — works but history is muddy |
git reset --soft |
15 pts — functional but risky |
git push --force |
-50 pts — destroys shared history, never do this |
What gets scored:
| Component | Weight |
|---|---|
| Correct imposter identified | 35% |
| Correct commit identified | 20% |
| Resolution method | 30% |
| Time efficiency | 15% |
| Component | Technology |
|---|---|
| Framework | FastAPI 0.111+ (Python 3.11) |
| AI / LLM | Google Gemini 2.5 Flash Lite via LangChain |
| Agent Orchestration | LangChain + LangGraph |
| Database + Auth | Appwrite 1.8+ |
| Code Execution | Docker SDK — ephemeral sandboxed containers |
| Caching | Redis |
| Real-time | FastAPI native WebSockets |
| Config | Pydantic-settings |
| Component | Technology |
|---|---|
| Framework | Next.js 14 (App Router) |
| Language | TypeScript |
| Auth | Appwrite Client SDK (Google + GitHub OAuth) |
| Code Editor | Monaco Editor (@monaco-editor/react) |
| State | Zustand |
| HTTP Client | Axios with JWT interceptor |
| Design System | Google Material-inspired (CSS variables) |
| Fonts | Outfit, Roboto, JetBrains Mono |
| Component | Technology |
|---|---|
| Orchestration | Docker Compose |
| Ports | Frontend: 3000, Backend: 8000 |
| Auth Provider | Appwrite |
Every AI-driven simulation phase uses the same 3-agent architecture:
Student Input
│
▼
DIRECTOR (Hidden)
├── Sees the truth file (hidden ground truth)
├── Scores the student's action quality
├── Decides how much information to reveal
└── Writes an instruction for the Actor
│
│ actor_instruction
▼
ACTOR (Visible)
├── Plays the role (client / CTO / teammate)
├── Responds in character
└── Reveals only what the Director permits
│
│ visible response
▼
Student sees response + score feedback
[Session ends]
▼
EVALUATOR (Runs once)
├── Reads full history + truth file
└── Generates: letter grade, strengths, areas for improvement, detailed breakdown
Every simulation has a hidden ground truth stored server-side — never sent to the client. This is the source of all scoring: complete client requirements, bug locations, imposter identity. Students must earn information through the simulation itself.
All student code runs in ephemeral Docker containers with: no network access, read-only filesystem, non-root user, 256MB memory limit, 50 PID limit, 30-second execution timeout, no Linux capabilities.
- Docker Desktop
- Node.js 20+
- Python 3.11+
- An Appwrite instance (self-hosted or cloud)
- A Google AI Studio API key (for Gemini)
git clone https://github.com/your-org/coderesidency.git
cd coderesidency
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env.localFill in your credentials:
# backend/.env
GEMINI_API_KEY=your_google_ai_studio_key
APPWRITE_ENDPOINT=https://your-appwrite-instance/v1
APPWRITE_PROJECT_ID=your_project_id
APPWRITE_API_KEY=your_server_api_key
REDIS_URL=redis://redis:6379# frontend/.env.local
NEXT_PUBLIC_APPWRITE_ENDPOINT=https://your-appwrite-instance/v1
NEXT_PUBLIC_APPWRITE_PROJECT_ID=your_project_id
NEXT_PUBLIC_API_URL=http://localhost:8000cd backend
pip install -r requirements.txt
python scripts/setup_db.pymake dev
# or: docker-compose up --build- Frontend:
http://localhost:3000 - API Docs:
http://localhost:8000/docs
Create a scenario in Appwrite's scenarios collection with a truth_file JSON:
{
"persona": {
"name": "Marcus",
"role": "Restaurant Owner",
"personality": "Impatient, non-technical, budget-obsessed",
"speaking_style": "Short sentences. Gets frustrated with jargon.",
"greeting": "Finally! I've been waiting. I need an app. Make it quick."
},
"constraints": [
{
"id": "budget",
"description": "Maximum budget is $200 total including hosting",
"reveal_triggers": ["budget", "cost", "price", "afford", "money"],
"partial_hint": "I'm not made of money.",
"full_reveal": "I can spend two hundred dollars. That's it. Final."
},
{
"id": "must_work_offline",
"description": "App must work without internet — unreliable WiFi",
"reveal_triggers": ["internet", "wifi", "online", "offline", "connection"],
"partial_hint": "Our WiFi is... not the best.",
"full_reveal": "The WiFi cuts out every Friday night. Whatever you build has to work without it."
}
]
}See ARCHITECTURE.md for the complete truth file schema.
Full interactive docs: http://localhost:8000/docs
POST /api/scenarios/{scenario_id}/start
Authorization: Bearer <jwt>POST /api/chat/send
Authorization: Bearer <jwt>
{ "session_id": "abc123", "message": "Could you tell me about your budget?" }POST /api/execute
Authorization: Bearer <jwt>
{ "session_id": "abc123", "language": "python", "code": "print('hello')" }POST /api/evaluation/{session_id}/generate
Authorization: Bearer <jwt>| Phase | Module | Max Score | Formula Components |
|---|---|---|---|
| 1 | The Difficult Client | 100 | Discovery 40% + Communication 20% + Code 40% |
| 2 | The Code Crucible | 100 | Test Cases 50% + Quality 30% + Efficiency 20% |
| 3 | The Bug Hunt | 100 | 25 pts/bug − 10 pts/hint + efficiency bonus |
| 4 | The War Room | 100 | Base 1000 − time penalty + diagnosis bonus |
| 5 | The Imposter | 100 | Identification 55% + Resolution 30% + Time 15% |
Grade scale: A+ (95+) A (90+) B+ (85+) B (80+) C+ (75+) C (70+) D (60+) F (<60)
codeResidency/
├── README.md ← This file
├── ARCHITECTURE.md ← Full technical design document (all 5 phases)
├── docker-compose.yml
├── Makefile
│
├── backend/app/
│ ├── agents/
│ │ ├── dual_agent.py ← Phase 1 (complete)
│ │ ├── coding_challenge_agent.py ← Phase 2 (planned)
│ │ ├── bug_injector_agent.py ← Phase 3 (planned)
│ │ ├── emergency_agent.py ← Phase 4 (planned)
│ │ └── imposter_agent.py ← Phase 5 (planned)
│ ├── routers/ ← API endpoints per phase
│ ├── scoring/engine.py ← All scoring formulas
│ └── docker_manager/sandbox.py ← Secure code execution
│
└── frontend/app/
├── modules/
│ ├── difficult-client/ ← Phase 1 (complete)
│ ├── code-crucible/ ← Phase 2 (planned)
│ ├── bug-hunt/ ← Phase 3 (planned)
│ ├── war-room/ ← Phase 4 (stub)
│ └── imposter/ ← Phase 5 (stub)
├── dashboard/
├── leaderboard/
└── evaluation/
| Phase | Module | Status |
|---|---|---|
| 1 | The Difficult Client | ✅ Complete |
| 2 | The Code Crucible | 🏗 Next |
| 3 | The Bug Hunt | 🏗 Planned |
| 4 | The War Room | 🏗 Planned (stub exists) |
| 5 | The Imposter | 🏗 Planned (stub exists) |
| — | Full Residency Run | 🔮 Future |
| — | Instructor Dashboard | 🔮 Future |
| — | Custom Scenario Builder | 🔮 Future |
# Branch naming
feature/phase2-code-crucible
fix/war-room-timer-websocket
docs/update-architecture
# Start dev environment
make dev
# Backend-only (faster iteration)
cd backend && uvicorn app.main:app --reload --port 8000See ARCHITECTURE.md for the full implementation specification for each phase.
MuLearn SCET — CodeResidency is developed as a capstone project at Sarvajanik College of Engineering and Technology, in collaboration with the MuLearn community.
CodeResidency — because the gap between university and Day 1 should be training, not trauma.