Skip to content

binaryjack/ai-agencee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

253 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AI Agencee

Enterprise-grade multi-agent orchestration engine — DAG-supervised parallel agents with streaming LLM output, intelligent model routing, resilience patterns, cost tracking, RBAC, audit logging, VS Code integration with Commander mode and Codernic intelligence, and a zero-API-key demo mode.

Status: ✅ Production-Ready | Tests: 925 total (868+ passing) | Enterprise Features: 14 completed (E1–E14) | VS Code Extension: v0.6.57


Who is this for?

Audience Why they use it
Individual developers Run the full AI-assisted development loop — from idea to wired sprint plan — with Commander mode in VS Code, Codernic's codebase intelligence, and zero API costs during exploration
Feature squads (2–8 people) Coordinate parallel workstreams with hard sync points, automated handoffs between agents, supervisor-gated quality checks, and visual DAG execution in VS Code
Platform / enterprise teams Roll out AI-assisted workflows to multiple squads with RBAC, multi-tenant isolation, audit trails, cost controls, VS Code integration, and CI — all enforced at the engine level
AI tooling builders Use the DAG engine, MCP bridge, plugin system, and TypeScript Builder API as infrastructure for custom AI products

What problem does it solve?

"I want AI to help me build real software — not just generate snippets."

Most AI coding tools stop at the file level. AI Agencee operates at the project level:

  • A structured 5-phase discovery process turns a vague requirement into a precise, wired sprint plan — with every agent knowing their scope, dependencies, and acceptance criteria before writing a line
  • A DAG execution engine runs specialised agents in parallel, detects conflicts via alignment barriers, retries on failure, hands off between agents, and escalates to a human when it can't recover automatically
  • Supervisor checkpoints enforce quality at every step — not just at the end — so regressions surface during planning, not in production
  • Zero API-key demo mode means the entire system can be evaluated, tested in CI, and learned without spending anything

It ships two execution paths that compose seamlessly:

Path Entry point When to use
Plan System ai-kit plan Discovery → synthesis → decomposition → wiring → DAG hand-off for a new project or feature
DAG Engine ai-kit agent:dag <dag.json> Run any defined agent graph directly: code review, security audit, migration, documentation, CI gate

Why AI Agencee and not another tool?

Need Generic AI chat Code-gen copilots AI Agencee
Structured multi-step plan from a vague idea ❌ Hallucinated ⚠️ Single-file suggestions ✅ 5-phase BA-led discovery → wired sprint plan
Parallel agent coordination with sync points ✅ DAG barriers, soft-align, read-contract
Automatic retry + escalation on failure retryBudget, HANDOFF, ESCALATE verdicts
Human-in-the-loop approval gates needs-human-review checkpoint
Enterprise: RBAC, audit, multi-tenant, PII, OIDC ✅ E1–E13 enforced at runtime
Zero-cost evaluation + CI integration ✅ Mock provider, $0.00, no keys
Extensible: custom agents, checks, providers ⚠️ ⚠️ ✅ Plugin system + TypeScript Builder API

⚡ Quickies — Get a result in under 5 minutes

Copy-paste recipes for the most common tasks. No reading required.

I want to… Command
Run a Pirsig quality audit pnpm code:index first, then ai-kit code audit — or auto-fires after every DAG run via MCP
Inspect workspace coordination galileus_workspace_state via MCP (@ai-kit galileus_workspace_state)
Install the engine in my project npm install @ai-agencee/engine
Install the CLI globally npm install -g @ai-agencee/cli
See the engine run with no setup clone the repo → pnpm demo (↓ Explore Without Code)
See failures, retries, escalations clone the repo → pnpm demo:06
Run a DAG from my own project ai-kit agent:dag ./my-dag.json --provider mock
Plan a new app from scratch ai-kit plan
Add a feature to an existing codebase ai-kit plan → type feature when asked for story type
Security audit my project ai-kit agent:dag ./security-review.dag.json --provider mock
Create a custom agent in 5 min ↓ guide below · Q4 full recipe →
Set up a CI quality gate Q18 in Quickies →
Enterprise adoption checklist Q13 in Quickies →
Data migration plan + cutover gate Q19 in Quickies →

📖 Full recipe list (19 quickies): docs/quickies.md


What it is (technical)

AI Agencee is a TypeScript monorepo that turns JSON-defined agent graphs into production-ready AI workflows with enterprise-grade security, compliance, and observability.

Full Documentation: Start with 📚 Features Index for all capabilities.


Use Cases

Typical workflows for engineering teams that need deterministic, auditable, multi-agent automation:

🔍 Code Review & Architecture Analysis

Multi-lane parallel review — security, readability, architecture, performance — all running simultaneously. Supervisor checkpoints enforce quality deterministically. Cost tracked per lane for compliance and budgeting.

📋 Product Discovery & Sprint Planning

The 5-phase Plan System takes a vague idea through BA-led discovery → synthesis → decomposition → dependency wiring → DAG execution. Every task has an owner, acceptance criteria, and effort estimate before a line is written.

🛡️ Security & Compliance Automation

Security-review agents with enforced Opus-tier model routing, PII scrubbing, immutable audit logging, RBAC, multi-tenant isolation, and GDPR CLI (data:export / data:delete). Ready to drop into a compliance workflow.

🚦 CI/CD Quality Gates

Run DAGs on pull requests or releases. needs-human-review checkpoints block the pipeline until an operator approves. Slack, Teams, and Jira integrations fire automatically on escalation or budget exceeded.

🗂️ Large-Scale File & Repository Analysis

Parallel lanes scanning thousands of files. grep, json-field, count-files checks for deterministic validation; llm-review lanes for contextual synthesis. Results written to structured JSON for downstream consumption.

🏢 Enterprise Orchestration

Multi-agent workflows with configurable retry budgets, circuit breakers, OIDC JWT auth, per-principal rate limiting, and webhook triggers for GitHub, CI, or internal systems.


Core Capabilities

🎯 Orchestration & Execution

🔐 Enterprise & Security

  • Authentication & RBAC — Role-based access control with OIDC JWT support
  • Audit Logging — Immutable hash-chained audit trails for compliance
  • Multi-Tenant Isolation — Per-tenant data isolation and run sandboxing
  • PII Scrubbing — Automatic detection and redaction of sensitive data
  • Rate Limiting — Token budget and concurrent run limits per principal
  • Codernic (E14) — Codebase-aware coding agent (449 files/1.03s), symbol extraction, dependency graphs, write code that compiles on first try
  • Pirsig Quality Engine (E14) — Self-calibrating quality audit: StyleProfile extraction, ConsistencyAuditor, KPI scoring (0–100), drift detection — auto-triggers via Galileus after every DAG run
  • Galileus Coordination — Multi-session intent queue: concurrent agent sessions declare file claims before acting; Galileus detects conflicts, queues waiters, and cascades unblocks automatically
  • Event Bus — Typed real-time event subscriptions for lane status, tokens, costs
  • DAG Visualizer — Mermaid and DOT output for architecture visualization
  • Cost Analytics — Per-run and per-principal cost breakdowns

👨‍💻 Developer Experience

  • VS Code Extension — Commander mode for workflows, Codernic (ASK/PLAN/AGENT), visual editors, code intelligence (@ai-kit and @codernic chat participants)
  • Codernic Intelligence — Codebase-aware assistant with FTS5 indexing (449 files/1.03s), hybrid context strategy, three-mode operation
  • TypeScript Builder API — Fluent, type-safe DSL for DAG construction
  • CLI Commands — Full command reference with examples
  • MCP Integration — VS Code and Claude Desktop support

Packages

Package Description Docs
packages/agent-executor Core engine: DAG orchestrator, supervised agents, model router, resilience, RBAC, audit logging Agent Executor Docs
packages/cli ai-kit CLI — init, sync, check, agent:dag, plan, visualize, data CLI Reference
_private/ai-agencee-ext VS Code Extension — Commander mode, Codernic, visual editors, code intelligence VS Code Extension
packages/core Shared filesystem utilities, template scaffolding, event types Features Index
packages/mcp VS Code MCP bridge, OIDC auth middleware, SSE server, GitHub Copilot routing MCP Integration
packages/galileus Multi-session coordination: SQLite intent queue, conflict detection, cascade resolver Galileus README

Install

Add the engine to any Node.js / TypeScript project:

npm install @ai-agencee/engine
# or
yarn add @ai-agencee/engine
# or
pnpm add @ai-agencee/engine

Install the CLI globally (or as a dev dependency):

npm install -g @ai-agencee/cli
# or as a dev dep:
npm install -D @ai-agencee/cli

Usage

Programmatic — run a DAG from TypeScript

import { DagOrchestrator } from '@ai-agencee/engine';

const orchestrator = new DagOrchestrator(process.cwd(), {
  forceProvider: 'mock',   // swap for 'anthropic' | 'openai' once you have keys
  verbose: true,
});

const result = await orchestrator.run('./my-dag.json');
console.log(result.status); // 'complete' | 'partial' | 'failed'

CLI — run a DAG directly

# Mock provider — no API key required
ai-kit agent:dag ./my-dag.json --provider mock

# With Anthropic
ANTHROPIC_API_KEY=sk-... ai-kit agent:dag ./my-dag.json

# With OpenAI
OPENAI_API_KEY=sk-... ai-kit agent:dag ./my-dag.json --provider openai

# 5-phase interactive planning session
ai-kit plan

# Visualise a DAG as a Mermaid diagram
ai-kit dag:visualize ./my-dag.json

📖 See: DAG Orchestration · CLI Reference · Quickies — copy-paste recipes


Explore Without Code

Want to see what the engine does before writing anything? Clone the repo and run the zero-key demos:

git clone https://github.com/binaryjack/ai-agencee.git
cd ai-agencee
pnpm install && pnpm build

# Original 3-lane demo — NO API keys required
pnpm demo

# Interactive menu — pick from 6 advanced scenarios
pnpm demo:menu
Demo command What it shows
pnpm demo:01 App Boilerplate — RETRY × 2, hard-barrier
pnpm demo:02 Enterprise Skeleton — HANDOFF, needs-human-review
pnpm demo:03 Website Build — ESCALATE terminal 🚨
pnpm demo:04 Feature in Context — soft-align, read-contract
pnpm demo:05 MVP Sprint — flaky lane, mixed results
pnpm demo:06 Resilience Showcase — every error type at once
pnpm demo:plan 5-Phase Plan Demo — seed Phase 0 → run from SYNTHESIZE

All demos use the built-in MockProviderzero API keys, zero cost.

📖 See: Advanced Demo Scenarios


Model Routing & Cost Control

Intelligent routing automatically selects the optimal model tier based on task complexity and budget constraints.

Configuration: agents/model-router.json

Task type Family Anthropic model OpenAI model Cost /1M tokens
file-analysis haiku claude-haiku-4-5 gpt-4o-mini $0.80
code-generation sonnet claude-sonnet-4-5 gpt-4o $3.00
code-review sonnet claude-sonnet-4-5 gpt-4o $3.00
architecture-decision opus claude-opus-4-5 gpt-4o $15.00
security-review opus claude-opus-4-5 gpt-4o $15.00

Key Features:

  • ✅ Per-run budget enforcement
  • ✅ Fallback to cheaper models when budget-constrained
  • ✅ Real-time cost tracking per check and lane
  • ✅ Cost attribution per principal (user/service)

📖 See: Model Routing & Cost Tracking


Check Handler Types

Agents compose any mix of these typed checks:

Type Description Use Case
file-exists Assert a file path is present Pre-flight validation
dir-exists Assert a directory exists Pre-flight validation
count-files / count-dirs Count files matching a glob Coverage analysis
grep Regex search inside text files Pattern matching
json-field / json-has-key JSON schema / value assertions Data validation
run-command Execute shell command, inspect stdout/exit code System integration
llm-generate LLM generation with streaming output Content creation
llm-review LLM review / critique with streaming output Analysis & feedback

📖 See: Check Handlers & Validators, Tool-Use Integration


Create an Agent in 5 Minutes

An agent is a JSON file — a named role with an ordered list of checks. A DAG wires one or more agents into lanes.

1. Define the agent — agents/my-agent.json

{
  "name": "File Summariser",
  "description": "Confirms a file exists then produces a structured summary.",
  "checks": [
    {
      "type": "file-exists",
      "path": "input.txt",
      "pass": "✅ Input file confirmed",
      "fail": "❌ input.txt not found",
      "failSeverity": "error"
    },
    {
      "type": "llm-review",
      "path": "input.txt",
      "taskType": "validation",
      "prompt": "Summarise the content below in 5 concise bullet points.\n\nContent:\n{content}",
      "outputKey": "summary",
      "pass": "✅ Summary produced",
      "fail": "⚠️ Summary incomplete",
      "recommendations": ["Expand bullet points if content exceeds 500 words"]
    }
  ]
}

2. Wire it into a DAG — agents/my-dag.json

{
  "name": "My First DAG",
  "description": "Single-lane file summariser.",
  "lanes": [
    {
      "id": "summarise",
      "agentFile": "my-agent.json",
      "dependsOn": []
    }
  ]
}

3. Run it

# Zero-cost mock run
ai-kit agent:dag agents/my-dag.json --provider mock

# Real LLM
ANTHROPIC_API_KEY=sk-... ai-kit agent:dag agents/my-dag.json

4. Add a supervisor checkpoint (optional) — agents/my-supervisor.json

Add deterministic quality enforcement. If the check fails, the engine retries with injected instructions before escalating.

{
  "laneId": "summarise",
  "retryBudget": 1,
  "checkpoints": [
    {
      "checkpointId": "after-summary",
      "mode": "self",
      "expect": { "minFindings": 1 },
      "onFail": "RETRY",
      "retryInstructions": "The summary is empty — retry and produce at least 3 bullet points."
    }
  ]
}

Then reference it in the lane:

{
  "id": "summarise",
  "agentFile": "my-agent.json",
  "supervisorFile": "my-supervisor.json",
  "dependsOn": []
}

That's it. Model routing, cost tracking, resilience, streaming, audit logging, RBAC, and multi-tenant isolation are all applied automatically by the engine.

📖 See: Full recipe with parallel lanes and barriers → · DAG Orchestration


Resilience & Reliability

All LLM provider calls are protected by intelligent retry and circuit breaker patterns:

Retry Policy

  • Exponential backoff with jitter to prevent thundering herd
  • Configurable retry conditions — 429/500/503 transient errors by default
  • Preset: 4 attempts, 1s → 32s max delay
  • Respects Retry-After headers from providers

Circuit Breaker

  • CLOSED → OPEN → HALF_OPEN state machine per provider
  • 5-failure threshold to trigger opening
  • 60s cooldown before attempting recovery
  • Per-provider stats for observability

📖 See: Resilience Patterns


Real-Time Streaming Output

Every llm-generate and llm-review check streams tokens directly to process.stdout as they arrive.

Supported Providers:

  • ✅ Anthropic (SSE)
  • ✅ OpenAI (SSE + stream_options)
  • ✅ VS Code Copilot (fallback to complete)
  • ✅ Mock (word-level simulation)

📖 See: Streaming Output & Real-Time Feedback


Enterprise Features (E1–E7)

All implemented and enforced at runtime:

ID Feature Status Details
E1 PII Scrubbing ✅ Active Automatic detection and redaction via regex patterns
E2 Security Audit ✅ Active CI/CD scanning via GitHub Actions on every push
E3 Multi-Tenant ✅ Active Path-isolated run roots per tenant ID
E4 GDPR Data CLI ✅ Active data:export, data:delete, data:list-tenants
E5 OIDC JWT Auth ✅ Active RS256/ES256 Bearer token validation on SSE events
E6 Rate Limiting ✅ Active Token budget + concurrent run limits per principal
E7 DAG Visualizer ✅ Active Mermaid + DOT output for architecture visualization

📖 See: Enterprise Readiness, Authentication & RBAC, Audit Logging


Plan System — 5-Phase Discovery to Execution

A single pnpm run:plan session takes you from a vague idea to running agent tasks. Each phase is distinct, inspectable, and resumable.


Phase 0 — DISCOVER

What: The BA agent interviews you with ~12 structured questions across 4 blocks:
problem definition · primary users · stories (feature/fix/migration/spike) · stack constraints.

You do: Answer in plain English. The BA probes and clarifies.
Output: .agents/plan-state/discovery.json — a complete DiscoveryResult capturing every answer.

🧠 BA › What problem are you solving?
👤 You › Users can't track their subscription status in real time.
🧠 BA › Who is the primary user — consumer or internal team?
👤 You › Consumer, B2C SaaS, ~50k MAU.
🧠 BA › I'll capture: real-time subscription status for 50k MAU consumer SaaS…
         What quality grade? (mvp / enterprise / poc-stub)

Skip this phase with a pre-seeded discovery: pnpm demo:plan:01 through pnpm demo:plan:05


Phase 1 — SYNTHESIZE

What: The BA reads the discovery result and produces a plan skeleton — Steps with rough Tasks, ownership, and acceptance criteria. You review and approve.

Output: .agents/plan-state/plan.json at phase synthesize — Steps defined, Tasks stubbed.

🧠 BA › Draft plan for "Real-time Subscription Status":
         Step 1: Webhook ingestion (Backend)   — receive Stripe events
         Step 2: Status store (Database)        — idempotent event log
         Step 3: SSE endpoint (Backend)         — stream status to clients
         Step 4: UI widget (Frontend)           — live status badge
         Step 5: Test suite (Testing + E2E)     — contract + acceptance tests

         Approve? [y / edit / add story]

Phase 2 — DECOMPOSE

What: Each specialist agent (Architecture, Backend, Frontend, Testing, E2E) expands their Steps into detailed Tasks in parallel. Each task gets: description, acceptance criteria, estimated effort, and output artefacts.

Output: .agents/plan-state/plan.json fully populated — every task defined.

🏗️  Architecture  › Decomposing Step 1…
⚙️  Backend       › Decomposing Step 2, 3…     ← parallel
🎨  Frontend      › Decomposing Step 4…        ← parallel
🧪  Testing       › Decomposing Step 5…        ← parallel

Phase 3 — WIRE

What: The engine computes the dependency graph across all tasks, detects conflicts between agent plans, injects alignment gates at conflict points, and produces the execution order.

Output: .agents/plan-state/plan.json at phase wire — dependencies set, AlignmentGate objects injected, the Arbiter resolves any cross-agent conflicts.

⚖️  Arbiter › Conflict: Backend Step 3 (SSE schema) ↔ Frontend Step 4 (event type)
             Resolution: agree on { type: 'subscription.status', payload: StatusEvent }
             → alignment gate injected after Step 3

Phase 4 — EXECUTE

What: PlanOrchestrator feeds the wired plan into the DagOrchestrator lane by lane, respecting the computed dependency order. Supervisors enforce acceptance criteria at every checkpoint. Results land in .agents/results/.

Output: Execution artefacts per task, findings log, full DagResult JSON.

⚡  System  › Executing wired plan — 5 steps, 18 tasks
▶  Group 1: webhook-ingestion + status-store   ← parallel
✅  webhook-ingestion  — 3 checkpoints, 0 retries
✅  status-store       — 2 checkpoints, 0 retries
▶  Group 2: sse-endpoint
✅  sse-endpoint        — 2 checkpoints, 1 retry
▶  Group 3: ui-widget + test-suite             ← parallel
...

# Start the full interactive session:
pnpm run:plan

# Jump to Phase 1 with a pre-seeded discovery (no Q&A):
pnpm demo:plan          # interactive seed picker
pnpm demo:plan:01       # App Boilerplate seed
pnpm demo:plan:02       # Enterprise Skeleton seed
pnpm demo:plan:04       # Feature-in-context seed (billing on existing platform)
pnpm demo:plan:05       # MVP Sprint seed (2-week solo)

📖 See: demo-scenarios.md — 5-Phase Plan Demo


Documentation

Comprehensive feature guides are available in docs/features/:

Core Features

Advanced Execution

Enterprise & Security

Developer Tools

📚 Full Index: All Features


Contributing

# Clone and set up the monorepo
git clone https://github.com/binaryjack/ai-agencee.git
cd ai-agencee
pnpm install          # install all workspace deps
pnpm build            # compile all packages (tsc)
pnpm test             # run all Jest suites (519 tests across 36 files)

# Advanced demo scenarios (no API keys)
pnpm demo             # original 3-lane mock demo
pnpm demo:menu        # interactive scenario picker
pnpm demo:all         # run all 6 scenarios in sequence
pnpm demo:01          # App Boilerplate  (RETRY × 2, hard-barrier)
pnpm demo:02          # Enterprise       (HANDOFF, needs-human-review)
pnpm demo:03          # Website Build    (ESCALATE terminal)
pnpm demo:04          # Feature-in-ctx   (soft-align, read-contract)
pnpm demo:05          # MVP Sprint       (flaky lane)
pnpm demo:06          # Resilience       (all error types)

# 5-Phase Plan system
pnpm demo:plan        # seed Phase 0 → launch plan from SYNTHESIZE
pnpm demo:plan:01     # App Boilerplate seed
pnpm demo:plan:04     # Feature-in-context seed (billing on existing platform)
pnpm run:plan         # start fully interactive planning session

# DAG execution
pnpm run:dag agents/dag.json      # execute a DAG
pnpm visualize agents/dag.json    # output Mermaid/DOT diagram

Roadmap & Status

Enterprise Features (E1–E14) ✅ All Implemented & Tested

ID Feature Status Details
E1 PII Scrubbing Automatic detection and redaction via regex patterns
E2 Security Audit CI/CD scanning via GitHub Actions (pnpm audit --audit-level=high)
E3 Multi-Tenant Isolation Path-isolated run roots per tenant, GDPR-compliant
E4 GDPR Data CLI data:export, data:delete, data:list-tenants commands
E5 OIDC JWT Auth RS256/ES256 Bearer token validation on SSE /events endpoint
E6 Rate Limiting Token budget + concurrent run limits per principal
E7 DAG Visualizer Mermaid + DOT output for architecture visualization
E8 Prompt Injection Detection 10 detection families; configurable warn/block modes
E9 Python MCP Bridge JSON-RPC 2.0 subprocess bridge; PythonMcpProvider LLM adapter
E10 AWS Bedrock Provider SigV4-signed Converse API; supports Claude/Llama/Titan on Bedrock
E11 Jira/Linear Sync Post issues on DAG lane failure via REST/GraphQL; fromEnv()
E12 Slack/Teams Notifications Incoming webhooks on DAG/lane end + budget exceeded; parallel delivery
E13 Run Advisor (Auto-Tune) Analyzes run history → suggests model downgrades, budget optimization, stability improvements
E14 Codernic + Pirsig Quality Engine CodebaseIndexer (449 files/1.03s, FTS5 SQLite), ASK/PLAN/AGENT/ANALYSE modes, atomic multi-file patches, StyleProfile extraction, ConsistencyAuditor, KPI scoring (0–100), drift detection

📖 See: Enterprise Features Index for implementation details

Advanced Features Implemented

Feature Roadmap ID Status Details
Prompt Distillation G-37 Few-shot example collection for self-improving prompts
Code Execution Sandbox G-38 Isolated Node/Python/Bash code execution with timeout + output capture
Vector Memory G-13 In-memory semantic search with cosine similarity
SQLite Vector Memory G-24/G-25 Persistent embeddings with better-sqlite3 backend
Webhook Triggers G-16 GitHub webhook integration for DAG execution
DAG Builder Fluent API G-22 Type-safe TypeScript DSL for programmatic DAG construction
LLM-as-Judge Eval G-50 Structured evaluation harness for output quality assessment
OpenTelemetry G-08 Distributed tracing and metrics collection
Plugin System Core Custom check types and provider extensions
Human Review Gate Core Manual approval checkpoints in DAG execution

License

MIT — see LICENSE.

Support & Resources

About

Enterprise-grade multi-agent orchestration engine — DAG-supervised parallel agents with streaming LLM output, intelligent model routing, resilience patterns, cost tracking, RBAC, audit logging, and a zero-API-key demo mode.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors