Visible staged triage for support teams under pressure. Messy case in, structured response pack out.
Live demo: https://async-copilot.vercel.app
A hiring-focused AI Product Engineer case study: a scoped support-triage workflow with real LLM inference, deterministic fallback, human approval gates, Slack dispatch evidence, and portable reviewer exports.
Async Copilot is not a generic chatbot demo. It shows how AI can sit inside a real operational workflow where a human operator still owns the final decision.
The product proves a narrow but complete loop:
- Workflow product thinking: intake -> visible triage -> response pack -> approval -> export.
- Applied AI integration: Groq/Llama inference when configured, synthetic fallback when not.
- Trust boundaries: no autonomous outbound action; Slack dispatch is approval-gated and logged.
- Durability: persisted run stages, event timeline, background pickup, retries, and idempotent action attempts.
- Reviewer evidence: markdown/text/JSON exports include provenance, approval history, action log, timing/fallback summaries, and golden assertions.
- Delivery discipline: unit tests, E2E smoke coverage, CI, audit tooling, and documented scope boundaries.
If you only spend 60-90 seconds on this repo, this is the flow to understand:
- An operator starts with a pasted support case or a seeded scenario.
- The system creates a case, creates a run, and advances a visible 6-stage triage workflow.
- If
GROQ_API_KEYis present, the run streams live stage output via SSE. - If AI is unavailable, the same workflow falls back to deterministic synthetic output.
- The run ends with a response pack containing confidence, recommendation, citations, and staged actions.
- A human must approve the pack before any outbound integration boundary is crossed.
- Approval can trigger a Slack webhook in
dry_runor live mode, while all other staged actions remain queued. - The full pack can be exported as markdown for handoff or review, including compact trust evidence.
This demo is designed around a support-operations reviewer, not a generic AI playground.
- Input: an urgent or ambiguous support ticket
- Middle: visible classification, internal lookup, policy check, and draft generation
- Output: a response pack an operator can approve, export, or escalate
- Boundary: the system can prove one real external action after approval without pretending to be autonomous
sequenceDiagram
participant Operator
participant UI as Next.js UI
participant API as API Routes
participant DB as Supabase
participant Groq as Groq
participant Slack as Slack Webhook
Operator->>UI: Load scenario or paste case
UI->>API: POST /api/cases
API->>DB: Insert case
UI->>API: POST /api/runs
API->>DB: Insert run + stages
alt GROQ_API_KEY configured
UI->>API: GET /api/runs/{id}/stream
API->>Groq: Generate stage output
Groq-->>API: Stream tokens
API->>DB: Persist stage output + cursor
API-->>UI: SSE stage updates
else Fallback mode
UI->>API: POST /api/runs/{id}/advance (poll loop)
API->>DB: Persist synthetic output + cursor
API-->>UI: Updated run state
end
API->>DB: Persist response pack
Operator->>UI: Approve response pack
UI->>API: POST /api/runs/{id}/approve
API->>DB: Persist approval
API->>Slack: Dispatch webhook (dry-run or live)
API->>DB: Persist dispatch status
API-->>UI: Updated staged actions + dispatch state
Operator->>API: GET /api/runs/{id}/export?format=markdown
What is proven in code, not just described:
- Golden-path E2E: intake -> run -> escalation/completion -> approve -> export
- Failure-path unit tests: export before pack ready, stream without AI, approval dispatch state
- Build hygiene: lint, unit tests, production build, typecheck, and high-severity production dependency audit in CI
- Honest fallback: the app remains usable without a model key
This repository is structured as a flagship portfolio project for an AI Product Engineer / Applied AI Engineer profile.
- Product judgment: scoped a narrow support-ops workflow instead of a broad chatbot.
- Full-stack execution: shipped UI, API routes, database schema, auth/workspaces, and deployment wiring.
- AI workflow engineering: integrated streaming LLM output while keeping deterministic fallback behavior.
- Trust and evaluation: added provenance, approval history, action attempts, golden assertions, and portable exports.
- Professional delivery: kept validation gates, GitHub Actions, audit docs, demo assets, and scope boundaries aligned.
The trust/evidence layer stays intentionally narrow, but it now proves reviewer-facing signals without overstating autonomy:
- Approval history: every response-pack approval is persisted with actor/time and remains portable in reviewer exports.
- Slack approval boundary: Slack is still the only real outbound action boundary, and it remains approval-gated in staged actions plus action-attempt history.
- Prompt/version provenance: newly completed stages persist prompt/version runtime provenance for reviewer inspection.
- Response-pack lineage: exports summarize when the pack was created and how much of the run executed via AI versus synthetic fallback.
- Compact timing/fallback evidence: reviewer exports include concise timing, fallback, and parse-warning summaries instead of only raw stage blobs.
- Golden regression checks: exports lock stage template, duration template, confidence, timing, approval-gate, and urgency metadata alignment checks for the seeded golden path.
- Honest wording: reviewer-facing evidence is phrased to match what the code actually proves, including metadata alignment versus stronger runtime claims.
The fastest visual way to evaluate this project is the bundled demo preview plus the short operator walkthrough.
- Demo preview:
public/demo/demo.gif - Demo asset package:
docs/demo/2026-04-25-async-copilot-demo-asset-package.md - Recording script:
docs/demo/2026-04-20-async-copilot-reviewer-walkthrough.md - Target length: 60-90 seconds
- Core sequence: intake -> run -> terminal pack -> approval -> Slack status -> export
The GIF is the primary quick-look asset. The walkthrough remains the source of truth for re-recording a narrated 60-90 second demo.
- Open the landing — https://async-copilot.vercel.app
- Click Open App → sign in with a magic link
- If this is your first session, create a workspace in onboarding
- Open the Payments Dispute — Duplicate Charge scenario card (golden path)
- Hit Start Triage → watch the 6-stage timeline progress in real time
- When it reaches terminal state (~5 sec), inspect the Response Pack, approve it, review the Slack dispatch status, or export the markdown pack
- Navigate to Runs or Samples in the header to browse
Try Paste instead: type or paste your own case body in the textarea → it still creates a real case + run with a generic fallback response pack.
| Route | Purpose |
|---|---|
/ |
Marketing landing — 7 sections (header, hero + workspace mockup, how-it-works, system trust, response pack showcase, closing CTA, footer) |
/login |
Magic-link auth entry point |
/app |
Authenticated bootstrap redirect into onboarding or default workspace |
/app/onboarding |
First-workspace bootstrap flow |
/app/w/[workspaceSlug] |
New Case intake form + live sample picker inside a workspace |
/app/w/[workspaceSlug]/runs/[runId] |
Live Triage Run signature screen: Case Context / Visible Triage / Response Pack / Event Timeline |
/app/w/[workspaceSlug]/runs |
Workspace runs list with search, state chip filters, progress + confidence columns |
/app/w/[workspaceSlug]/samples |
Scenario library — Golden Path + Alternatives, with body preview |
/api/health |
Machine-readable env + schema + row-count snapshot |
/api/samples · /api/cases · /api/runs · /api/runs/[id] |
REST endpoints |
/api/runs/[id]/advance · /approve · /export |
Run lifecycle mutations + export |
/api/runs/[id]/stream |
SSE streaming — real-time LLM tokens (Llama 3.3 70B) |
/api/cron/process-runs · /cleanup-stale · /daily-stats |
Background pickup and maintenance cron jobs |
- Framework: Next.js 15 (App Router, TypeScript, typedRoutes,
next/font/google) - AI Inference: Groq (Llama 3.3 70B) via Vercel AI SDK 6 — real streaming, JSON output
- Styling: Tailwind CSS 3.4 with project design tokens
- Icons:
@phosphor-icons/react(server-side rendered SVG) - Database: Supabase Postgres 17 (
eu-west-1/ Ireland) - Auth: Supabase Auth (magic link),
@supabase/ssr(browser + server),@supabase/supabase-js(admin) - Observability: Sentry (error tracking, 5k events/mo free tier)
- Hosting: Vercel (Stockholm edge, auto-deploy on every push to
main) - Unit Tests: Vitest · E2E: Playwright
- CI/CD: GitHub Actions (audit + lint + unit tests + build + typecheck on pushes/PRs)
- Rate Limiting: In-memory sliding window (20 req/min/IP)
- Cron: Vercel Cron (background run pickup, stale-run cleanup, daily stats snapshot)
Total monthly cost: $0 (all services on free tiers)
- Gmail now exists only as a narrow manual import path: one workspace inbox, one thread/message import at a time.
- No workspace member management UI yet.
- No mailbox sync engine, Gmail history/webhook sync, or attachment ingestion yet.
- No CRM or ticketing integrations yet; the only outbound boundary is an optional Slack webhook dispatched after human approval.
- No production SLA claims or security guarantees beyond what is documented here.
- This is a portfolio implementation, not a live support product.
The repo now combines the original run engine tables with the Milestone 3 workspace/auth foundation, the Milestone 4 Gmail/background execution layer, and a narrow Milestone 5 trust-history layer.
Operational core:
samples— curated scenario library (read-only in UI)cases— support-case instances from manual intake, samples, or Gmailruns— triage lifecycle (pending→running→completed/escalated)run_stages— 6 stages per run withoutputJSON blobs, persisted timestamps, and legacyduration_msseed metadataresponse_packs— final artifact (confidence, recommendation, summary, draft reply, citations, staged actions)run_events— append-only reviewer timeline and audit trail for material state transitions; newstage.completedrows now also carry per-stage prompt/version provenancerun_action_attempts— durable outbound action attempt log for Slack delivery and retriesresponse_pack_approvals— durable approval-history rows for the reviewer boundary
Workspace/auth layer:
profilesworkspacesworkspace_memberships
Gmail source layer:
workspace_gmail_accounts— one shared Gmail connection per workspacegmail_messages— durable imported Gmail source-of-truth rows
Schema: supabase/migrations/001_initial_schema.sql through 011_milestone5_approval_history.sql
Seeds: supabase/seeds/001_samples.sql + 002_golden_run.sql
Stage provenance note:
- prompt/version provenance is attached only to newly completed
stage.completedevents - historical runs keep rendering normally, but older stage cards will not show provenance unless they were completed after this slice shipped
# 1. Install
npm install
# 2. Fill .env.local (copy .env.example)
cp .env.example .env.local
# Required: NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY,
# SUPABASE_SECRET_KEY, SUPABASE_DB_URL
# Optional: GROQ_API_KEY (enables real AI), NEXT_PUBLIC_SENTRY_DSN,
# SLACK_WEBHOOK_URL, SLACK_WEBHOOK_DRY_RUN,
# GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, GOOGLE_OAUTH_REDIRECT_URI
# 3. One-shot migrate + seed
npm run db:init
# 4. Start dev
npm run dev
# → http://localhost:3000
# 5. Run tests
npm test # Vitest unit tests
npm run test:e2e # Playwright E2EMagic-link auth requires the Supabase project to allow the production callback URL.
- Site URL:
https://async-copilot.vercel.app - Redirect URLs:
https://async-copilot.vercel.app/auth/callbackhttps://async-copilot.vercel.app/**http://localhost:3000/auth/callbackhttp://localhost:3000/**
If the project still has http://localhost:3000 as the Site URL, Supabase will ignore the requested production redirect and email links will send users back to localhost.
Useful scripts:
npm run typecheck— strict TypeScriptnpm run build— production buildnpm run lintnpm test— Vitest unit testsnpm run test:watch— Vitest in watch modenpm run db:migrate·npm run db:seed— split init
Manual Gmail import requires a Google OAuth client that allows the app callback URL.
- Development callback:
http://localhost:3000/api/gmail/callback - Production callback:
https://async-copilot.vercel.app/api/gmail/callback
If GOOGLE_OAUTH_REDIRECT_URI is set, it must match one of the allowed redirect URIs in the Google Cloud OAuth client exactly.
- Server-owned run progression — every
/advancecall is authoritative on the server. Client never mutates state directly. - SSE streaming with polling fallback — When
GROQ_API_KEYis set, the client connects via SSE and streams real LLM tokens. Without it, falls back to800mspolling with synthetic (regex) output. Zero-config degradation. - Real AI, graceful fallback — Llama 3.3 70B via Groq generates structured JSON for each stage. If the LLM fails or key is missing, regex-based inference kicks in seamlessly.
- Rate limiting — In-memory sliding window (20 req/min/IP) on write endpoints. Upgradeable to Upstash Redis.
- Background pickup and cleanup — Vercel Cron picks up queued/retrying runs and cleans up stale running runs.
- Approval-gated integration boundary — no autonomous action. Human approval can dispatch a Slack summary (live or dry-run); all other staged actions remain queued.
- Narrow real Gmail intake — a workspace can connect one Gmail inbox and manually import one thread/message into a case and run. Full sync/history processing remains deferred.
- Idempotent schema + seeds —
npm run db:initis safe to re-run. Demo environment can be reset cheaply. - One source of design truth —
docs/design/design-system.mdholds all tokens; every screen pulls from there.
src/
app/
layout.tsx, icon.svg, error.tsx, not-found.tsx
(marketing)/
page.tsx # landing page
(app)/
layout.tsx # authenticated app shell
app/
onboarding/page.tsx # first-workspace bootstrap
w/[workspaceSlug]/
page.tsx # case intake + sample picker
runs/page.tsx # workspace run history
runs/[runId]/page.tsx # live run detail
samples/page.tsx # scenario library
api/
cases/route.ts # GET list, POST create
cases/[caseId]/similar/route.ts # pgvector similarity lookup
gmail/callback/route.ts # Google OAuth callback
samples/route.ts # GET list
runs/route.ts # GET list, POST create
runs/[runId]/route.ts # GET detail
runs/[runId]/advance/route.ts # POST — one stage forward
runs/[runId]/approve/route.ts # POST — approve response pack
runs/[runId]/export/route.ts # GET — markdown/text/json
runs/[runId]/stream/route.ts # GET — SSE stage streaming
workspaces/route.ts # workspace bootstrap
cron/process-runs/route.ts # background run pickup
health/route.ts # GET — env + schema + counts
components/
marketing/hero-mockup.tsx
shared/app-header.tsx
features/
intake/components/new-case-page.tsx
runs/components/live-run-view.tsx
runs/components/runs-table.tsx
lib/
ai/client.ts # Groq provider (Vercel AI SDK)
integrations/slack.ts # approval-gated Slack webhook helper
integrations/gmail.ts # narrow manual Gmail import helper
ai/prompts.ts # 6 stage system prompts
supabase/{client,server,admin,types}.ts
runs/{background,create-run,events,execute-step}.ts
triage/run-model.ts # state machine + synthetic fallback
rate-limit.ts # in-memory rate limiter
supabase/
migrations/001-011 # Postgres schema through Milestone 5
seeds/{001_samples,002_golden_run}.sql
scripts/
db-init.mjs # pg-based migrator (no Supabase CLI)
tests/
unit/*.test.ts # route + model unit coverage
golden-path.spec.ts # Playwright E2E
docs/
audit/ # audit reports and scorecards
brainstorms/ # requirements and v2 product spec
case-study/ # reviewer-facing engineering narrative
demo/ # GIF/screenshots/walkthrough package
design/ # design system and reference screenshots
- Every push to
main→ Vercel build → production deploy (~45 sec). - Env vars are stored in Vercel encrypted storage (Development / Preview / Production).
- Supabase URL/keys are pulled at runtime via
process.env.*.
Five tools are wired up for continuous quality checks. All are opt-in
— they do not run on every PR by default (kept advisory to avoid merge
friction). Run locally or via GitHub Actions workflow_dispatch.
| Script | What it does |
|---|---|
npm run audit:links |
linkinator — scans the live site for broken links, href="#" dead anchors, 404s. |
npm run audit:a11y |
pa11y-ci + axe-core — WCAG 2 AA scan across public routes. |
npm run audit:perf |
Lighthouse CI — perf / a11y / SEO / best-practices with enforced budgets (lighthouserc.cjs). |
npm run audit:visual |
Lost Pixel — screenshots 4 routes × 3 breakpoints, diffs against .lostpixel/baseline/. |
npm run audit |
Chains links → a11y → perf → visual. Full non-AI pass. |
GitHub Actions:
.github/workflows/audit-lighthouse.yml— weekly Monday cron + manual trigger.github/workflows/audit-a11y.yml— weekly Monday cron + manual trigger.github/workflows/audit-linkcheck.yml— lychee on push tomain+ weekly cron.github/workflows/audit-visual.yml— manual trigger only (baselines are committed)
Each workflow uploads its output as an artifact (14-day retention) so reports stay accessible without polluting the repo.
npm run audit # links + a11y + perf + visualArtifacts are written to .lighthouseci/, .lostpixel/, and workflow artifacts. Baselines for Lost Pixel should be committed only after an intentional visual change is approved.
See docs/audit/2026-04-26-github-hiring-readiness.md for the current GitHub/repository hygiene pass and docs/audit/2026-04-18.md for the original site-audit baseline.
docs/case-study/engineering-case-study.md— concise engineering narrativedocs/demo/2026-04-25-async-copilot-demo-asset-package.md— GIF, screenshots, and walkthrough mapdocs/audit/2026-04-26-github-hiring-readiness.md— repo hygiene and security auditARCHITECTURE.md— system architecture, data model, design decisionsdocs/brainstorms/2026-04-24-async-copilot-v2-spec.md— v2 product spec, chosen scenario, and milestone orderdocs/brainstorms/2026-04-18-async-copilot-requirements.md— 22 MVP requirements (R1–R22)docs/design/design-system.md— canonical tokensdocs/plans/2026-04-18-001-feat-async-copilot-demo-plan.md— original 9-unit build plan




