Simulate, Evaluate, and Evolve AI Agents with Autonomous Feedback Loops.
- Overview
- Key Features
- System Architecture
- Technology Stack
- Quick Start
- Configuration
- API Reference
- Roadmap
- Troubleshooting
- Contributing
- License
- Contact
Odeon is a cutting-edge playground for AI Agent Engineering. It solves the "black box" problem of prompt tuning by automating the evaluation loop. Instead of manually tweaking prompts and hoping for better results, Odeon:
- Simulates realistic user interactions (e.g., a stubborn debt defaulter).
- Evaluates the agent's performance against strict numerical KPIs (Empathy, Negotiation, Repetition).
- Optimizes the system prompt automatically using a meta-agent if targets are missed.
The result is a self-improving agent that converges on the optimal persona for your specific business goals.
-
โก Autonomous Optimization Loop
- Generates diverse user personas (e.g., "The Lawyer", "The Crying Student").
- Runs high-fidelity simulations using Groq for near-instant inference.
- Rewrites prompts automatically based on granular feedback.
-
๐ก Real-Time Simulation Stream
- Bi-directional WebSocket integration.
- Watch agent interactions unfold character-by-character.
- Live state tracking of current optimization cycles.
-
๐ Neural Visual Diffing
- Git-style Red/Green diff viewer for Prompt Evolution.
- See exactly which words changed to improve empathy or compliance.
-
๐จ Neo-Brutalist / Glassmorphism UI
- A high-end, distraction-free interface built with Tailwind CSS 4.
- Dark mode focused "Deep Space" aesthetic.
-
๐ Strict Metric Thresholds
- Define pass/fail criteria (1-10) for Repetition, Negotiation, and Empathy.
- Agents must meet all criteria to "pass" a scenario.
-
๐๏ธ SQLite History & Replay
- Every run is archived. You can replay, analyze, and fork past simulations.
Odeon uses a decoupled, event-driven architecture to handle high-concurrency simulations.
graph TD
User[User / Browser] -->|HTTP / WS| FE["Frontend (React + Vite)"]
FE -->|WebSocket| API["Backend API (FastAPI)"]
subgraph "Backend Enclave"
API -->|Dispatch| Sim[Simulator Engine]
Sim -->|Generate| Gen[Persona Generator]
Sim -->|Chat| Agent[Agent LLM]
Sim -->|Chat| UserSim[User Simulator LLM]
Sim -->|Data| Eval[Evaluator]
Eval -->|Feedback| Opt[Prompt Optimizer]
Opt -->|New Prompt| Agent
Agent <--> Groq[Groq Llama 3 API]
UserSim <--> Groq
Eval <--> Groq
Opt <--> Groq
end
API -->|Read/Write| DB[(SQLite History DB)]
| Component | Tech | Description |
|---|---|---|
| Backend | Python 3.10+ | Core Application Logic |
| API Framework | FastAPI | Async, High-performance REST & WS |
| AI Inference | Groq Cloud | Llama 3.1-8b / 70b (Ultra-fast) |
| Orchestration | LangChain | Chain Management & Parsing |
| Database | SQLite | Lightweight embedded persistence |
| Frontend | React 19 | UI Library with Concurrent Mode |
| Build Tool | Vite | Instant HMR & bundling |
| Styling | Tailwind CSS 4 | Utility-first CSS engine |
| Type Safety | TypeScript | End-to-end typing |
- Python 3.10+
- Node.js 18+ &
npm - Groq API Key (Get it free at console.groq.com)
git clone https://github.com/vasu-devs/odeon.git
cd odeoncd backend
python -m venv venv
# Activate Venv
source venv/bin/activate # Mac/Linux
# venv\Scripts\activate # Windows
pip install -r requirements.txtcd ../frontend
npm installCreate a .env file in the backend/ directory:
# Required: The engine power
GROQ_API_KEY=gsk_your_key_here
# Optional: For experimental multi-model support
GEMINI_API_KEY=your_gemini_keyTerminal 1 (Backend):
cd backend
# Make sure venv is active
python server.pyTerminal 2 (Frontend):
cd frontend
npm run devVisit http://localhost:5173 to launch Odeon.
Request (Start Simulation):
{
"api_key": "gsk_...",
"model_name": "llama3-8b-8192",
"base_prompt": "You are a specialized agent...",
"thresholds": { "negotiation": 8.0, "empathy": 7.5 }
}Response (Events):
log: Raw system output.result: Final conversation metrics.optimization: Diff of the prompt change.
- Multi-Agent Swarms: Simulating group dynamics.
- Vector Memory: Giving the agent long-term memory across runs.
- Cloud Deploy: One-click deploy to Vercel/Railway.
- Custom Models: Support for Anthropic/OpenAI via LiteLLM.
- Export Results: PDF/CSV export for compliance reporting.
Q: I get a 401 Unauthorized error from Groq.
A: Check your .env file. Ensure GROQ_API_KEY is set correctly and has no trailing spaces.
Q: The frontend shows "Disconnected".
A: Ensure the backend is running on port 8000. Check the terminal for any Python traceback errors.
Q: Optimize Loop isn't updating the prompt. A: Ensure your "Overall" threshold isn't set too low. If the agent passes the low threshold, it won't optimize. Increase the target scores.
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
- Vasudev Siddh - Initial Work - vasu-devs
Built with โค๏ธ by the Vasu-devs