Enterprise-grade multi-agent orchestration engine — DAG-supervised parallel agents with streaming LLM output, intelligent model routing, resilience patterns, cost tracking, RBAC, audit logging, VS Code integration with Commander mode and Codernic intelligence, and a zero-API-key demo mode.
Status: ✅ Production-Ready | Tests: 925 total (868+ passing) | Enterprise Features: 14 completed (E1–E14) | VS Code Extension: v0.6.57
| Audience | Why they use it |
|---|---|
| Individual developers | Run the full AI-assisted development loop — from idea to wired sprint plan — with Commander mode in VS Code, Codernic's codebase intelligence, and zero API costs during exploration |
| Feature squads (2–8 people) | Coordinate parallel workstreams with hard sync points, automated handoffs between agents, supervisor-gated quality checks, and visual DAG execution in VS Code |
| Platform / enterprise teams | Roll out AI-assisted workflows to multiple squads with RBAC, multi-tenant isolation, audit trails, cost controls, VS Code integration, and CI — all enforced at the engine level |
| AI tooling builders | Use the DAG engine, MCP bridge, plugin system, and TypeScript Builder API as infrastructure for custom AI products |
"I want AI to help me build real software — not just generate snippets."
Most AI coding tools stop at the file level. AI Agencee operates at the project level:
- A structured 5-phase discovery process turns a vague requirement into a precise, wired sprint plan — with every agent knowing their scope, dependencies, and acceptance criteria before writing a line
- A DAG execution engine runs specialised agents in parallel, detects conflicts via alignment barriers, retries on failure, hands off between agents, and escalates to a human when it can't recover automatically
- Supervisor checkpoints enforce quality at every step — not just at the end — so regressions surface during planning, not in production
- Zero API-key demo mode means the entire system can be evaluated, tested in CI, and learned without spending anything
It ships two execution paths that compose seamlessly:
| Path | Entry point | When to use |
|---|---|---|
| Plan System | ai-kit plan |
Discovery → synthesis → decomposition → wiring → DAG hand-off for a new project or feature |
| DAG Engine | ai-kit agent:dag <dag.json> |
Run any defined agent graph directly: code review, security audit, migration, documentation, CI gate |
| Need | Generic AI chat | Code-gen copilots | AI Agencee |
|---|---|---|---|
| Structured multi-step plan from a vague idea | ❌ Hallucinated | ✅ 5-phase BA-led discovery → wired sprint plan | |
| Parallel agent coordination with sync points | ❌ | ❌ | ✅ DAG barriers, soft-align, read-contract |
| Automatic retry + escalation on failure | ❌ | ❌ | ✅ retryBudget, HANDOFF, ESCALATE verdicts |
| Human-in-the-loop approval gates | ❌ | ❌ | ✅ needs-human-review checkpoint |
| Enterprise: RBAC, audit, multi-tenant, PII, OIDC | ❌ | ❌ | ✅ E1–E13 enforced at runtime |
| Zero-cost evaluation + CI integration | ❌ | ❌ | ✅ Mock provider, $0.00, no keys |
| Extensible: custom agents, checks, providers | ✅ Plugin system + TypeScript Builder API |
Copy-paste recipes for the most common tasks. No reading required.
| I want to… | Command |
|---|---|
| Run a Pirsig quality audit | pnpm code:index first, then ai-kit code audit — or auto-fires after every DAG run via MCP |
| Inspect workspace coordination | galileus_workspace_state via MCP (@ai-kit galileus_workspace_state) |
| Install the engine in my project | npm install @ai-agencee/engine |
| Install the CLI globally | npm install -g @ai-agencee/cli |
| See the engine run with no setup | clone the repo → pnpm demo (↓ Explore Without Code) |
| See failures, retries, escalations | clone the repo → pnpm demo:06 |
| Run a DAG from my own project | ai-kit agent:dag ./my-dag.json --provider mock |
| Plan a new app from scratch | ai-kit plan |
| Add a feature to an existing codebase | ai-kit plan → type feature when asked for story type |
| Security audit my project | ai-kit agent:dag ./security-review.dag.json --provider mock |
| Create a custom agent in 5 min | ↓ guide below · Q4 full recipe → |
| Set up a CI quality gate | Q18 in Quickies → |
| Enterprise adoption checklist | Q13 in Quickies → |
| Data migration plan + cutover gate | Q19 in Quickies → |
📖 Full recipe list (19 quickies): docs/quickies.md
AI Agencee is a TypeScript monorepo that turns JSON-defined agent graphs into production-ready AI workflows with enterprise-grade security, compliance, and observability.
Full Documentation: Start with 📚 Features Index for all capabilities.
Typical workflows for engineering teams that need deterministic, auditable, multi-agent automation:
Multi-lane parallel review — security, readability, architecture, performance — all running simultaneously. Supervisor checkpoints enforce quality deterministically. Cost tracked per lane for compliance and budgeting.
The 5-phase Plan System takes a vague idea through BA-led discovery → synthesis → decomposition → dependency wiring → DAG execution. Every task has an owner, acceptance criteria, and effort estimate before a line is written.
Security-review agents with enforced Opus-tier model routing, PII scrubbing, immutable audit logging, RBAC, multi-tenant isolation, and GDPR CLI (data:export / data:delete). Ready to drop into a compliance workflow.
Run DAGs on pull requests or releases. needs-human-review checkpoints block the pipeline until an operator approves. Slack, Teams, and Jira integrations fire automatically on escalation or budget exceeded.
Parallel lanes scanning thousands of files. grep, json-field, count-files checks for deterministic validation; llm-review lanes for contextual synthesis. Results written to structured JSON for downstream consumption.
Multi-agent workflows with configurable retry budgets, circuit breakers, OIDC JWT auth, per-principal rate limiting, and webhook triggers for GitHub, CI, or internal systems.
- DAG Orchestration — Declarative JSON-based DAG with parallel lanes, barriers, and supervisor checkpoints
- Streaming Output — Real-time token-by-token feedback from LLM providers
- Resilience Patterns — Exponential backoff retry, circuit breakers, graceful fallbacks
- Model Routing & Cost — Intelligent provider selection, budget enforcement, cost tracking
- Tool-Use Integration — Agents calling functions within LLM turns with supervisor approval
- Authentication & RBAC — Role-based access control with OIDC JWT support
- Audit Logging — Immutable hash-chained audit trails for compliance
- Multi-Tenant Isolation — Per-tenant data isolation and run sandboxing
- PII Scrubbing — Automatic detection and redaction of sensitive data
- Rate Limiting — Token budget and concurrent run limits per principal
- Codernic (E14) — Codebase-aware coding agent (449 files/1.03s), symbol extraction, dependency graphs, write code that compiles on first try
- Pirsig Quality Engine (E14) — Self-calibrating quality audit: StyleProfile extraction, ConsistencyAuditor, KPI scoring (0–100), drift detection — auto-triggers via Galileus after every DAG run
- Galileus Coordination — Multi-session intent queue: concurrent agent sessions declare file claims before acting; Galileus detects conflicts, queues waiters, and cascades unblocks automatically
- Event Bus — Typed real-time event subscriptions for lane status, tokens, costs
- DAG Visualizer — Mermaid and DOT output for architecture visualization
- Cost Analytics — Per-run and per-principal cost breakdowns
- VS Code Extension — Commander mode for workflows, Codernic (ASK/PLAN/AGENT), visual editors, code intelligence (@ai-kit and @codernic chat participants)
- Codernic Intelligence — Codebase-aware assistant with FTS5 indexing (449 files/1.03s), hybrid context strategy, three-mode operation
- TypeScript Builder API — Fluent, type-safe DSL for DAG construction
- CLI Commands — Full command reference with examples
- MCP Integration — VS Code and Claude Desktop support
| Package | Description | Docs |
|---|---|---|
packages/agent-executor |
Core engine: DAG orchestrator, supervised agents, model router, resilience, RBAC, audit logging | Agent Executor Docs |
packages/cli |
ai-kit CLI — init, sync, check, agent:dag, plan, visualize, data |
CLI Reference |
_private/ai-agencee-ext |
VS Code Extension — Commander mode, Codernic, visual editors, code intelligence | VS Code Extension |
packages/core |
Shared filesystem utilities, template scaffolding, event types | Features Index |
packages/mcp |
VS Code MCP bridge, OIDC auth middleware, SSE server, GitHub Copilot routing | MCP Integration |
packages/galileus |
Multi-session coordination: SQLite intent queue, conflict detection, cascade resolver | Galileus README |
Add the engine to any Node.js / TypeScript project:
npm install @ai-agencee/engine
# or
yarn add @ai-agencee/engine
# or
pnpm add @ai-agencee/engineInstall the CLI globally (or as a dev dependency):
npm install -g @ai-agencee/cli
# or as a dev dep:
npm install -D @ai-agencee/cliimport { DagOrchestrator } from '@ai-agencee/engine';
const orchestrator = new DagOrchestrator(process.cwd(), {
forceProvider: 'mock', // swap for 'anthropic' | 'openai' once you have keys
verbose: true,
});
const result = await orchestrator.run('./my-dag.json');
console.log(result.status); // 'complete' | 'partial' | 'failed'# Mock provider — no API key required
ai-kit agent:dag ./my-dag.json --provider mock
# With Anthropic
ANTHROPIC_API_KEY=sk-... ai-kit agent:dag ./my-dag.json
# With OpenAI
OPENAI_API_KEY=sk-... ai-kit agent:dag ./my-dag.json --provider openai
# 5-phase interactive planning session
ai-kit plan
# Visualise a DAG as a Mermaid diagram
ai-kit dag:visualize ./my-dag.json📖 See: DAG Orchestration · CLI Reference · Quickies — copy-paste recipes
Want to see what the engine does before writing anything? Clone the repo and run the zero-key demos:
git clone https://github.com/binaryjack/ai-agencee.git
cd ai-agencee
pnpm install && pnpm build
# Original 3-lane demo — NO API keys required
pnpm demo
# Interactive menu — pick from 6 advanced scenarios
pnpm demo:menu| Demo command | What it shows |
|---|---|
pnpm demo:01 |
App Boilerplate — RETRY × 2, hard-barrier |
pnpm demo:02 |
Enterprise Skeleton — HANDOFF, needs-human-review |
pnpm demo:03 |
Website Build — ESCALATE terminal 🚨 |
pnpm demo:04 |
Feature in Context — soft-align, read-contract |
pnpm demo:05 |
MVP Sprint — flaky lane, mixed results |
pnpm demo:06 |
Resilience Showcase — every error type at once |
pnpm demo:plan |
5-Phase Plan Demo — seed Phase 0 → run from SYNTHESIZE |
All demos use the built-in MockProvider — zero API keys, zero cost.
📖 See: Advanced Demo Scenarios
Intelligent routing automatically selects the optimal model tier based on task complexity and budget constraints.
Configuration: agents/model-router.json
| Task type | Family | Anthropic model | OpenAI model | Cost /1M tokens |
|---|---|---|---|---|
file-analysis |
haiku | claude-haiku-4-5 | gpt-4o-mini | $0.80 |
code-generation |
sonnet | claude-sonnet-4-5 | gpt-4o | $3.00 |
code-review |
sonnet | claude-sonnet-4-5 | gpt-4o | $3.00 |
architecture-decision |
opus | claude-opus-4-5 | gpt-4o | $15.00 |
security-review |
opus | claude-opus-4-5 | gpt-4o | $15.00 |
Key Features:
- ✅ Per-run budget enforcement
- ✅ Fallback to cheaper models when budget-constrained
- ✅ Real-time cost tracking per check and lane
- ✅ Cost attribution per principal (user/service)
📖 See: Model Routing & Cost Tracking
Agents compose any mix of these typed checks:
| Type | Description | Use Case |
|---|---|---|
file-exists |
Assert a file path is present | Pre-flight validation |
dir-exists |
Assert a directory exists | Pre-flight validation |
count-files / count-dirs |
Count files matching a glob | Coverage analysis |
grep |
Regex search inside text files | Pattern matching |
json-field / json-has-key |
JSON schema / value assertions | Data validation |
run-command |
Execute shell command, inspect stdout/exit code | System integration |
llm-generate |
LLM generation with streaming output | Content creation |
llm-review |
LLM review / critique with streaming output | Analysis & feedback |
📖 See: Check Handlers & Validators, Tool-Use Integration
An agent is a JSON file — a named role with an ordered list of checks. A DAG wires one or more agents into lanes.
{
"name": "File Summariser",
"description": "Confirms a file exists then produces a structured summary.",
"checks": [
{
"type": "file-exists",
"path": "input.txt",
"pass": "✅ Input file confirmed",
"fail": "❌ input.txt not found",
"failSeverity": "error"
},
{
"type": "llm-review",
"path": "input.txt",
"taskType": "validation",
"prompt": "Summarise the content below in 5 concise bullet points.\n\nContent:\n{content}",
"outputKey": "summary",
"pass": "✅ Summary produced",
"fail": "⚠️ Summary incomplete",
"recommendations": ["Expand bullet points if content exceeds 500 words"]
}
]
}{
"name": "My First DAG",
"description": "Single-lane file summariser.",
"lanes": [
{
"id": "summarise",
"agentFile": "my-agent.json",
"dependsOn": []
}
]
}# Zero-cost mock run
ai-kit agent:dag agents/my-dag.json --provider mock
# Real LLM
ANTHROPIC_API_KEY=sk-... ai-kit agent:dag agents/my-dag.jsonAdd deterministic quality enforcement. If the check fails, the engine retries with injected instructions before escalating.
{
"laneId": "summarise",
"retryBudget": 1,
"checkpoints": [
{
"checkpointId": "after-summary",
"mode": "self",
"expect": { "minFindings": 1 },
"onFail": "RETRY",
"retryInstructions": "The summary is empty — retry and produce at least 3 bullet points."
}
]
}Then reference it in the lane:
{
"id": "summarise",
"agentFile": "my-agent.json",
"supervisorFile": "my-supervisor.json",
"dependsOn": []
}That's it. Model routing, cost tracking, resilience, streaming, audit logging, RBAC, and multi-tenant isolation are all applied automatically by the engine.
📖 See: Full recipe with parallel lanes and barriers → · DAG Orchestration
All LLM provider calls are protected by intelligent retry and circuit breaker patterns:
- Exponential backoff with jitter to prevent thundering herd
- Configurable retry conditions — 429/500/503 transient errors by default
- Preset: 4 attempts, 1s → 32s max delay
- Respects Retry-After headers from providers
- CLOSED → OPEN → HALF_OPEN state machine per provider
- 5-failure threshold to trigger opening
- 60s cooldown before attempting recovery
- Per-provider stats for observability
📖 See: Resilience Patterns
Every llm-generate and llm-review check streams tokens directly to process.stdout as they arrive.
Supported Providers:
- ✅ Anthropic (SSE)
- ✅ OpenAI (SSE +
stream_options) - ✅ VS Code Copilot (fallback to complete)
- ✅ Mock (word-level simulation)
📖 See: Streaming Output & Real-Time Feedback
All implemented and enforced at runtime:
| ID | Feature | Status | Details |
|---|---|---|---|
| E1 | PII Scrubbing | ✅ Active | Automatic detection and redaction via regex patterns |
| E2 | Security Audit | ✅ Active | CI/CD scanning via GitHub Actions on every push |
| E3 | Multi-Tenant | ✅ Active | Path-isolated run roots per tenant ID |
| E4 | GDPR Data CLI | ✅ Active | data:export, data:delete, data:list-tenants |
| E5 | OIDC JWT Auth | ✅ Active | RS256/ES256 Bearer token validation on SSE events |
| E6 | Rate Limiting | ✅ Active | Token budget + concurrent run limits per principal |
| E7 | DAG Visualizer | ✅ Active | Mermaid + DOT output for architecture visualization |
📖 See: Enterprise Readiness, Authentication & RBAC, Audit Logging
A single pnpm run:plan session takes you from a vague idea to running agent tasks.
Each phase is distinct, inspectable, and resumable.
What: The BA agent interviews you with ~12 structured questions across 4 blocks:
problem definition · primary users · stories (feature/fix/migration/spike) · stack constraints.
You do: Answer in plain English. The BA probes and clarifies.
Output: .agents/plan-state/discovery.json — a complete DiscoveryResult capturing every answer.
🧠 BA › What problem are you solving?
👤 You › Users can't track their subscription status in real time.
🧠 BA › Who is the primary user — consumer or internal team?
👤 You › Consumer, B2C SaaS, ~50k MAU.
🧠 BA › I'll capture: real-time subscription status for 50k MAU consumer SaaS…
What quality grade? (mvp / enterprise / poc-stub)
Skip this phase with a pre-seeded discovery:
pnpm demo:plan:01throughpnpm demo:plan:05
What: The BA reads the discovery result and produces a plan skeleton — Steps with rough Tasks, ownership, and acceptance criteria. You review and approve.
Output: .agents/plan-state/plan.json at phase synthesize — Steps defined, Tasks stubbed.
🧠 BA › Draft plan for "Real-time Subscription Status":
Step 1: Webhook ingestion (Backend) — receive Stripe events
Step 2: Status store (Database) — idempotent event log
Step 3: SSE endpoint (Backend) — stream status to clients
Step 4: UI widget (Frontend) — live status badge
Step 5: Test suite (Testing + E2E) — contract + acceptance tests
Approve? [y / edit / add story]
What: Each specialist agent (Architecture, Backend, Frontend, Testing, E2E) expands their Steps into detailed Tasks in parallel. Each task gets: description, acceptance criteria, estimated effort, and output artefacts.
Output: .agents/plan-state/plan.json fully populated — every task defined.
🏗️ Architecture › Decomposing Step 1…
⚙️ Backend › Decomposing Step 2, 3… ← parallel
🎨 Frontend › Decomposing Step 4… ← parallel
🧪 Testing › Decomposing Step 5… ← parallel
What: The engine computes the dependency graph across all tasks, detects conflicts between agent plans, injects alignment gates at conflict points, and produces the execution order.
Output: .agents/plan-state/plan.json at phase wire — dependencies set,
AlignmentGate objects injected, the Arbiter resolves any cross-agent conflicts.
⚖️ Arbiter › Conflict: Backend Step 3 (SSE schema) ↔ Frontend Step 4 (event type)
Resolution: agree on { type: 'subscription.status', payload: StatusEvent }
→ alignment gate injected after Step 3
What: PlanOrchestrator feeds the wired plan into the DagOrchestrator lane by
lane, respecting the computed dependency order. Supervisors enforce acceptance criteria
at every checkpoint. Results land in .agents/results/.
Output: Execution artefacts per task, findings log, full DagResult JSON.
⚡ System › Executing wired plan — 5 steps, 18 tasks
▶ Group 1: webhook-ingestion + status-store ← parallel
✅ webhook-ingestion — 3 checkpoints, 0 retries
✅ status-store — 2 checkpoints, 0 retries
▶ Group 2: sse-endpoint
✅ sse-endpoint — 2 checkpoints, 1 retry
▶ Group 3: ui-widget + test-suite ← parallel
...
# Start the full interactive session:
pnpm run:plan
# Jump to Phase 1 with a pre-seeded discovery (no Q&A):
pnpm demo:plan # interactive seed picker
pnpm demo:plan:01 # App Boilerplate seed
pnpm demo:plan:02 # Enterprise Skeleton seed
pnpm demo:plan:04 # Feature-in-context seed (billing on existing platform)
pnpm demo:plan:05 # MVP Sprint seed (2-week solo)📖 See: demo-scenarios.md — 5-Phase Plan Demo
Comprehensive feature guides are available in docs/features/:
Core Features
- DAG Orchestration & Execution
- Agent Types & Roles
- Model Routing & Cost Tracking
- Check Handlers & Validators
Advanced Execution
- Streaming Output
- Tool-Use Integration
- Resilience Patterns (Retry & Circuit Breaker)
- Event Bus & Real-Time Events
Enterprise & Security
- Authentication & RBAC
- Audit Logging & Compliance
- Multi-Tenant Isolation
- PII Scrubbing & Injection Defense
Developer Tools
📚 Full Index: All Features
# Clone and set up the monorepo
git clone https://github.com/binaryjack/ai-agencee.git
cd ai-agencee
pnpm install # install all workspace deps
pnpm build # compile all packages (tsc)
pnpm test # run all Jest suites (519 tests across 36 files)
# Advanced demo scenarios (no API keys)
pnpm demo # original 3-lane mock demo
pnpm demo:menu # interactive scenario picker
pnpm demo:all # run all 6 scenarios in sequence
pnpm demo:01 # App Boilerplate (RETRY × 2, hard-barrier)
pnpm demo:02 # Enterprise (HANDOFF, needs-human-review)
pnpm demo:03 # Website Build (ESCALATE terminal)
pnpm demo:04 # Feature-in-ctx (soft-align, read-contract)
pnpm demo:05 # MVP Sprint (flaky lane)
pnpm demo:06 # Resilience (all error types)
# 5-Phase Plan system
pnpm demo:plan # seed Phase 0 → launch plan from SYNTHESIZE
pnpm demo:plan:01 # App Boilerplate seed
pnpm demo:plan:04 # Feature-in-context seed (billing on existing platform)
pnpm run:plan # start fully interactive planning session
# DAG execution
pnpm run:dag agents/dag.json # execute a DAG
pnpm visualize agents/dag.json # output Mermaid/DOT diagram| ID | Feature | Status | Details |
|---|---|---|---|
| E1 | PII Scrubbing | ✅ | Automatic detection and redaction via regex patterns |
| E2 | Security Audit | ✅ | CI/CD scanning via GitHub Actions (pnpm audit --audit-level=high) |
| E3 | Multi-Tenant Isolation | ✅ | Path-isolated run roots per tenant, GDPR-compliant |
| E4 | GDPR Data CLI | ✅ | data:export, data:delete, data:list-tenants commands |
| E5 | OIDC JWT Auth | ✅ | RS256/ES256 Bearer token validation on SSE /events endpoint |
| E6 | Rate Limiting | ✅ | Token budget + concurrent run limits per principal |
| E7 | DAG Visualizer | ✅ | Mermaid + DOT output for architecture visualization |
| E8 | Prompt Injection Detection | ✅ | 10 detection families; configurable warn/block modes |
| E9 | Python MCP Bridge | ✅ | JSON-RPC 2.0 subprocess bridge; PythonMcpProvider LLM adapter |
| E10 | AWS Bedrock Provider | ✅ | SigV4-signed Converse API; supports Claude/Llama/Titan on Bedrock |
| E11 | Jira/Linear Sync | ✅ | Post issues on DAG lane failure via REST/GraphQL; fromEnv() |
| E12 | Slack/Teams Notifications | ✅ | Incoming webhooks on DAG/lane end + budget exceeded; parallel delivery |
| E13 | Run Advisor (Auto-Tune) | ✅ | Analyzes run history → suggests model downgrades, budget optimization, stability improvements |
| E14 | Codernic + Pirsig Quality Engine | ✅ | CodebaseIndexer (449 files/1.03s, FTS5 SQLite), ASK/PLAN/AGENT/ANALYSE modes, atomic multi-file patches, StyleProfile extraction, ConsistencyAuditor, KPI scoring (0–100), drift detection |
📖 See: Enterprise Features Index for implementation details
| Feature | Roadmap ID | Status | Details |
|---|---|---|---|
| Prompt Distillation | G-37 | ✅ | Few-shot example collection for self-improving prompts |
| Code Execution Sandbox | G-38 | ✅ | Isolated Node/Python/Bash code execution with timeout + output capture |
| Vector Memory | G-13 | ✅ | In-memory semantic search with cosine similarity |
| SQLite Vector Memory | G-24/G-25 | ✅ | Persistent embeddings with better-sqlite3 backend |
| Webhook Triggers | G-16 | ✅ | GitHub webhook integration for DAG execution |
| DAG Builder Fluent API | G-22 | ✅ | Type-safe TypeScript DSL for programmatic DAG construction |
| LLM-as-Judge Eval | G-50 | ✅ | Structured evaluation harness for output quality assessment |
| OpenTelemetry | G-08 | ✅ | Distributed tracing and metrics collection |
| Plugin System | Core | ✅ | Custom check types and provider extensions |
| Human Review Gate | Core | ✅ | Manual approval checkpoints in DAG execution |
MIT — see LICENSE.
- 📚 Full Documentation: docs/features/INDEX.md
- ⚡ Quickies — copy-paste recipes (general + enterprise): docs/quickies.md
- 🎬 Advanced Demo Scenarios: docs/demo-scenarios.md
- 📋 Enterprise Features: docs/features/INDEX.md
- 🏗️ Architecture: agents/