diff --git a/README.md b/README.md index 91485fe..26e721b 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ **Put multiple AI models in a room. Give them personas. Watch them debate.** -RoundTable runs the Consensus Validation Protocol (CVP) across any combination of AI providers — Grok, Claude, GPT, Gemini, Mistral, and more — with configurable personas, real-time streaming, and a premium dark interface designed for long sessions. +RoundTable runs the **Consensus Validation Protocol (CVP)** and a **Blind Jury** engine across any combination of AI providers — Grok, Claude, GPT, Gemini, Mistral, and more — with configurable personas, a non-voting Judge synthesizer, a live confidence trajectory chart, a disagreement ledger, a cost meter, shareable permalinks, and a premium dark interface designed for long sessions. [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) [![Deploy with Vercel](https://img.shields.io/badge/Deploy-Vercel-black?logo=vercel)](https://vercel.com/new/clone?repository-url=https://github.com/entropyvortex/roundtable) @@ -44,11 +44,11 @@ The result is a scored collection of final perspectives, not a merged conclusion ### How It Works -CVP runs a fixed number of rounds (1–10, user-configured, default 5). Each round has a designated type that constrains what participants are asked to do. Participants are processed **sequentially within each round** — meaning later participants in a round see earlier participants' responses from that same round, in addition to all responses from prior rounds. +CVP runs up to a configured number of rounds (1–10, default 5). Each round has a designated type that constrains what participants are asked to do. From Round 2 onward, participants are processed **sequentially within each round** — later participants in a round see earlier participants' responses from that same round, in addition to all responses from prior rounds. Round 1 runs in **parallel with no cross-visibility** by default (toggleable via the "Blind Round 1" option) so the first wave of analysis is not contaminated by whoever happened to answer first. **Round phases:** -1. **Initial Analysis** (Round 1) — Each participant provides an independent analysis of the prompt, shaped by its assigned persona. No cross-visibility exists yet. Each response must end with a self-assessed confidence score (0–100). +1. **Initial Analysis** (Round 1) — Each participant provides an independent analysis of the prompt, shaped by its assigned persona. With "Blind Round 1" enabled (the default) every participant answers in parallel with no visibility into any other participant. Each response must end with a self-assessed confidence score (0–100). 2. **Counterarguments** (Round 2) — Each participant reviews all Round 1 responses and identifies weaknesses, challenges assumptions, and highlights logical gaps. Confidence scores are updated. @@ -56,6 +56,10 @@ CVP runs a fixed number of rounds (1–10, user-configured, default 5). Each rou 4. **Synthesis** (Rounds 4 through N) — Participants synthesize the discussion, acknowledge remaining uncertainties, and refine their positions. The final round is labeled "Final Synthesis" in the prompt, signaling participants to commit to a concluding position. +**Randomised order.** From Round 2 onward, participant order is shuffled per round by default to prevent the first-mover from disproportionately framing each round. Toggleable via the "Randomize order" option. + +**Early stopping.** When the consensus score delta between two consecutive rounds drops to ≤ 3 points, the engine emits an `early-stop` event and terminates the run before exhausting all configured rounds. This is on by default and saves cost on runs that converge quickly. Toggleable via the "Early stop" option. + **Persona injection:** Each participant's system prompt is prepended with a persona definition (e.g., "You are a Risk Analyst. Your role is to surface hidden dangers, tail risks, and second-order effects."). Personas are defined server-side in `lib/personas.ts` and cannot be modified by the client. **Confidence extraction:** Every response is expected to end with `CONFIDENCE: [0-100]`. A regex extracts this value. If absent, confidence defaults to 50. @@ -68,6 +72,14 @@ consensus_score = avg(confidence) - 0.5 * stddev(confidence) High average confidence with low variance yields a high score. Disagreement (high variance) penalizes the score even if individual confidences are high. +**Disagreement detection:** After each round the engine scans every pair of participants. Any pair whose confidence diverges by ≥ 20 points is recorded in the disagreement ledger and surfaced live in the UI. The detection is intentionally deterministic and cheap — no extra LLM calls — which makes it robust to rate limits and reproducible across runs. + +**Judge synthesis (optional):** When "Judge synthesis" is enabled, a dedicated non-voting model reads every participant's final-round response and produces a structured synthesis with four sections: **Majority Position**, **Minority Positions**, **Unresolved Disputes**, and **Synthesis Confidence**. The judge is forbidden from picking a winner or collapsing conditional minority views into the majority. Its output streams live to the UI and is included in all exports. + +**Cost meter.** Every call is attributed to a participant and priced against the client-side table in `lib/pricing.ts`. The live meter shows total tokens (in/out) and estimated USD; totals include the judge. When the Vercel AI SDK reports token usage, the meter uses it directly; otherwise it falls back to a 4-chars-per-token heuristic. + +**Provider error resilience.** When a participant's underlying provider call fails — wrong base URL, invalid API key, unknown model, upstream outage, 404 from a mismatched endpoint, you name it — the engine catches the error via the Vercel AI SDK's `onError` callback, formats it with the HTTP status code when available, logs the full error object server-side, and emits a `participant-end` event with an `error` field. The client renders that response as a red error card with the upstream message (not the usual content card), fires a toast identifying which provider/model broke, and **excludes the errored response from both the consensus score and the disagreement ledger**, so one broken provider can no longer tank the run. The remaining participants continue normally. + ### Protocol Diagram ```text @@ -121,6 +133,18 @@ User Prompt + Round Count + Participant Config └─────────────────────────────────────────────┘ ``` +### Blind Jury Engine (alternative) + +RoundTable also ships a **Blind Jury** engine alongside CVP. Where CVP is a multi-round debate, Blind Jury is a single-pass evaluation: + +1. Every participant answers the same prompt **in parallel**, with no cross-visibility into any other answer. +2. A judge model synthesizes majority, minority, and unresolved positions from the independent responses. +3. A disagreement ledger is computed from the pairwise confidence spread, exactly as in CVP. + +Blind Jury is the right engine when you want _independent_ signals rather than a negotiated consensus. Because there is no sequential visibility, it is immune to the anchoring bias that CVP needs randomized order and blind Round 1 to mitigate. It is also cheap: one API call per participant, plus one for the judge. + +Switch engines from the sidebar ("Protocol" section). The Blind Jury engine ignores the round count and the CVP-specific toggles. + ### Why This Is Better Than Majority Vote Majority vote asks N models the same question and picks the most common answer. CVP does something structurally different: @@ -141,15 +165,13 @@ Majority vote asks N models the same question and picks the most common answer. **Prompt bias propagation.** The user's prompt frames the debate. If the prompt contains a false premise, all participants may accept it. Personas like First-Principles Engineer and Scientific Skeptic are designed to push back, but their effectiveness depends on the model's ability to detect the bias. -**Sycophantic convergence.** Models tend to agree with prior responses, especially in later rounds. The sequential execution order means the last participant in each round sees the most prior context and may anchor to the emerging consensus rather than independently evaluating. This is the opposite of the intended effect. - -**No early stopping.** CVP always runs all N rounds. If participants converge in Round 2, Rounds 3–5 add latency and cost without new information. There is no convergence detection or early termination. +**Sycophantic convergence.** Models still tend to agree with prior responses, especially in later rounds. "Blind Round 1" and "Randomize order" reduce this bias but do not eliminate it — the last participant of any sequential round still sees the most prior context and may anchor to the emerging consensus rather than independently evaluating. Blind Jury avoids this failure mode entirely at the cost of giving up multi-round refinement. -**Persona dominance via ordering.** The first participant in each round sets the tone. Later participants respond to what exists rather than generating independently. The protocol does not randomize participant order between rounds. +**Cost scales linearly.** Each participant makes one API call per round. With 4 participants and 5 rounds, that is 20 API calls per consensus run, plus one for the judge if enabled. At 1,500 tokens per response, a single run can consume 30,000+ output tokens across providers. Early stopping and Blind Jury are the easiest levers to lower cost; the live cost meter in the floating run panel makes this concrete during a run. -**Cost scales linearly.** Each participant makes one API call per round. With 4 participants and 5 rounds, that is 20 API calls per consensus run. At 1,500 tokens per response, a single run can consume 30,000+ output tokens across providers. +**Confidence scores are self-reported.** Models assign their own confidence. There is no calibration, no ground truth, and no penalty for overconfidence. The consensus score is only as meaningful as the models' ability to self-assess — which is known to be unreliable. The judge synthesizer is deliberately _not_ a calibrator: it summarises what was said, it does not grade it. -**Confidence scores are self-reported.** Models assign their own confidence. There is no calibration, no ground truth, and no penalty for overconfidence. The consensus score is only as meaningful as the models' ability to self-assess — which is known to be unreliable. +**Disagreement heuristic is confidence-based.** The disagreement ledger flags pairs whose confidence diverges by ≥ 20 points. This catches substantive splits reliably but misses cases where two participants hold opposite positions with identical confidence. Treat the ledger as a lower bound on actual disagreement. ### Example Transcript @@ -187,19 +209,15 @@ _Final consensus score: 81 (avg=84, stddev=9.8)_ The human reader sees three final positions that largely converge but preserve the Futurist's conditional exception — something a majority vote would have discarded. -### Missing Pieces - -The following are not implemented in the current codebase but would make the protocol substantially more rigorous: +### Still Open -1. **Convergence detection and early stopping.** Compare confidence distributions between consecutive rounds. If the delta drops below a threshold, terminate early. This would save cost and avoid the sycophantic convergence problem in later rounds. +The following are deliberate non-goals for v1 but would further tighten the protocol: -2. **Randomized participant ordering.** Shuffle the participant sequence each round to prevent first-mover anchoring bias. The current fixed order means the first participant disproportionately frames each round. +1. **Confidence calibration or external validation.** Self-reported confidence is unreliable. A calibration step — comparing stated confidence to accuracy on known-answer questions — or a separate judge model that _grades_ argument quality (as opposed to the current faithfulness-only synthesizer) would add grounding. -3. **Explicit disagreement tracking.** Parse responses for areas of agreement and disagreement, maintain a structured disagreement ledger across rounds, and surface unresolved disputes in the final output rather than relying on the human to find them. +2. **Claim-level disagreement extraction.** The current disagreement ledger detects confidence splits, not semantic contradictions. A follow-up pass that extracts the actual claims participants make and flags direct contradictions would be more precise, at the cost of extra LLM calls. -4. **Confidence calibration or external validation.** Self-reported confidence is unreliable. A calibration step — comparing stated confidence to accuracy on known-answer questions — or a separate judge model that evaluates argument quality would add grounding. - -5. **Automated final synthesis.** A dedicated synthesis step where a separate model (or a designated participant) produces a single merged conclusion from all final-round responses, explicitly noting majority and minority positions. Currently, the human must do this manually. +3. **Pluggable engines beyond CVP and Blind Jury.** The engine interface is clean enough to support Delphi, Adversarial Red Team, Dialectical, and Ranked Choice variants. See the Roadmap table below. ## Security @@ -207,28 +225,36 @@ This is experimental, it has no authentication protection, if you publish this w --- -## Screenshots - -![Screenshot of Web Interface](screenshots/screenshot1.png) +## Screenshot -![Screenshot Consensus panel](screenshots/screenshot2.png) +![Screenshot of Web Interface](screenshots/newscreenshot.png) ## Features -| Feature | Description | -| --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | -| **Multi-Provider** | Connect any OpenAI-compatible API — Grok, Claude, OpenAI, Mistral, Groq, Together, and more | -| **7 Built-in Personas** | Risk Analyst, First-Principles Engineer, VC Specialist, Scientific Skeptic, Optimistic Futurist, Devil's Advocate, Domain Expert | -| **Consensus Validation Protocol** | Structured multi-round debate: Analysis, Counterarguments, Evidence Assessment, Synthesis | -| **1-10 Configurable Rounds** | Control the depth of deliberation | -| **Real-time SSE Streaming** | Watch responses arrive token-by-token with live progress tracking | -| **Cascaded Model Selector** | Provider-first dropdown with persona assignment per participant | -| **Message Flow Sidebar** | UML-style sequence diagram of the entire debate, click to navigate | -| **Copy to Clipboard** | One-click raw markdown export per response | -| **Cancel Anytime** | Stop button + Escape key — abort signal propagates to the server and stops provider calls | -| **Premium Dark UI** | High-contrast, readable interface designed for extended analysis sessions | -| **Rate-Limited API** | In-memory per-IP rate limiting, server-side input validation, persona/model re-verification | -| **No External Services** | No database, no auth service, no persistence — Vercel-deployable in one click | +| Feature | Description | +| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Multi-Provider** | Connect any OpenAI-compatible API — Grok, Claude, OpenAI, Mistral, Groq, Together, and more | +| **7 Built-in Personas** | Risk Analyst, First-Principles Engineer, VC Specialist, Scientific Skeptic, Optimistic Futurist, Devil's Advocate, Domain Expert | +| **Two Engines** | **CVP** (multi-round debate) and **Blind Jury** (parallel independent responses + judge synthesis) — switch from the sidebar | +| **Blind Round 1** | CVP's first round runs in parallel with zero cross-visibility so the first wave of analysis is not contaminated by speaking order | +| **Randomized Order** | CVP shuffles participant order in rounds 2+ to kill first-mover anchoring bias | +| **Early Stopping** | CVP detects convergence between rounds and terminates early, saving latency and tokens | +| **Judge Synthesizer** | Optional non-voting model that produces a structured **Majority / Minority / Unresolved / Confidence** summary over the final-round answers | +| **Confidence Trajectory Chart** | Live sparkline with one line per participant, so you can _see_ drift, convergence, and sycophancy as the run unfolds | +| **Disagreement Ledger** | Deterministic confidence-spread detector grouping flagged pairs by round — click a row to jump to that round in the transcript | +| **Cost Meter** | Live total tokens and estimated USD per run, with a bundled pricing table for major frontier models | +| **Floating Run Panel** | On xl+ screens a pinned right-side container stacks the cost meter, confidence trajectory, disagreement ledger, and a collapsible UML-style message flow diagram, scrolling as a unit so all four stay in view throughout a long transcript. Below xl the same panels fall back into the left sidebar | +| **Provider Error Handling** | Errored participant calls render as red error cards with the upstream message + HTTP status, fire a per-participant toast, and are excluded from the consensus score and disagreement ledger so one broken provider can't tank a run | +| **Prompt Library** | 8 curated preset prompts surfaced under the textarea for first-time visitors to hit Run immediately | +| **Session Export & Share** | One-click download as Markdown or JSON, plus a permalink that encodes the full run into the URL hash (compressed when available) | +| **Shared View Mode** | Loading a `#rt=…` permalink rehydrates the run into a read-only viewer for review, embedding, or screenshots | +| **Real-time SSE Streaming** | Watch responses arrive token-by-token with live progress tracking | +| **Cascaded Model Selector** | Provider-first dropdown with persona assignment per participant | +| **Copy to Clipboard** | One-click raw markdown export per response | +| **Cancel Anytime** | Stop button + Escape key — abort signal propagates to the server and stops provider calls | +| **Premium Dark UI** | High-contrast, readable interface designed for extended analysis sessions | +| **Rate-Limited API** | In-memory per-IP rate limiting, server-side input validation, persona/model re-verification | +| **No External Services** | No database, no auth service, no persistence — Vercel-deployable in one click | --- @@ -252,7 +278,7 @@ Edit `.env.local` with your keys, then: pnpm dev ``` -Open [http://localhost:3000](http://localhost:3000). Add participants from the sidebar, type a prompt, and hit **Run Consensus**. +Open [http://localhost:3000](http://localhost:3000). Add participants from the left sidebar, pick an engine in the **Protocol** panel (CVP or Blind Jury), optionally enable judge synthesis, type a prompt (or click a preset), and hit **Run Consensus**. On xl+ screens the cost meter, confidence trajectory, disagreement ledger, and message-flow diagram live in a floating panel pinned to the right of the viewport — watch them populate in real time as the debate streams. Below xl those same panels fall back into the left sidebar. When the run finishes, click **Export** in the results panel to download the transcript as Markdown/JSON or copy a permalink that rehydrates the run on any browser. --- @@ -299,6 +325,23 @@ The `apiKey` field supports two formats: API keys are resolved server-side only and never exposed to the browser. All AI calls go through Next.js API routes. +### Endpoint compatibility + +All consensus calls go through the **OpenAI chat completions** endpoint (`POST /chat/completions`), not the newer OpenAI Responses API. This is deliberate: `/chat/completions` is the one endpoint every provider's OpenAI-compat shim actually implements. In code we pin this by using `provider.chat(modelId)` instead of the default `provider(modelId)` — the latter targets `/responses`, which is OpenAI-only. + +That means your `baseUrl` should be the provider's base that serves `/chat/completions`: + +| Provider | Base URL | Notes | +| ---------- | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| OpenAI | `https://api.openai.com/v1` | Native endpoint. | +| Anthropic | `https://api.anthropic.com/v1` | Requires Anthropic's [OpenAI-SDK compatibility layer](https://docs.anthropic.com/en/api/openai-sdk). Models include `claude-sonnet-4-20250514`, `claude-opus-4-1`, etc. | +| xAI (Grok) | `https://api.x.ai/v1` | Native OpenAI-compatible. | +| Groq | `https://api.groq.com/openai/v1` | Native OpenAI-compatible. | +| Together | `https://api.together.xyz/v1` | Native OpenAI-compatible. | +| Mistral | `https://api.mistral.ai/v1` | Native OpenAI-compatible. | + +If a provider only ships a dedicated SDK with no `/chat/completions` shim, it is not currently supported. + ### Adding a New Provider Any OpenAI-compatible API works. Add an entry to the `AI_PROVIDERS` array with the correct `baseUrl` and you're done. Examples: @@ -330,24 +373,34 @@ Any OpenAI-compatible API works. Add an entry to the `AI_PROVIDERS` array with t ``` app/ api/ - consensus/route.ts SSE streaming endpoint — runs the CVP engine - providers/route.ts Returns client-safe model list (no secrets) - page.tsx Main dashboard — sidebar, prompt, results - layout.tsx Root layout with Sonner toasts + consensus/route.ts SSE streaming endpoint — validates options & dispatches to the engine + providers/route.ts Returns client-safe model list (no secrets) + page.tsx Main dashboard — sidebar, prompt, results, SSE processor + layout.tsx Root layout with Sonner toasts components/ - AISelector.tsx Cascaded provider/model picker + persona selector - ResultPanel.tsx Live streaming results with markdown rendering - MessageFlowDiagram.tsx Floating UML-style sequence diagram - BackToTop.tsx Scroll navigation + AISelector.tsx Cascaded provider/model picker + persona selector + ConfigPanel.tsx Engine selector, CVP toggles, judge model picker + ResultPanel.tsx Live streaming results, error cards, markdown rendering + MessageFlowDiagram.tsx Floating right-side panel: cost + trajectory + ledger + UML flow + ConfidenceTrajectory.tsx SVG sparkline of per-participant confidence across rounds + DisagreementPanel.tsx Grouped disagreement ledger with click-to-scroll + CostMeter.tsx Live token/USD totals + JudgeCard.tsx Non-voting judge synthesis output + PromptLibrary.tsx Preset prompt chips under the textarea + SessionMenu.tsx Export (Markdown/JSON) + copy permalink dropdown + BackToTop.tsx Scroll navigation lib/ - consensus-engine.ts Multi-round CVP orchestration with SSE - providers.ts Server-side provider resolution (parses AI_PROVIDERS) - personas.ts 7 persona definitions — edit this one file to add more - store.ts Zustand global state with granular selectors - types.ts All TypeScript types + consensus-engine.ts CVP + Blind Jury orchestration, judge synthesizer, disagreement detection + providers.ts Server-side provider resolution (parses AI_PROVIDERS) + personas.ts 7 participant personas + JUDGE_PERSONA + pricing.ts Model pricing table + cost estimator + prompt-library.ts Preset prompts for the library UI + session.ts Snapshot ↔ Markdown / JSON / URL-hash serializer + store.ts Zustand global state, options bundle, snapshot load/save + types.ts All TypeScript types ``` -The consensus engine runs entirely server-side. Each round streams responses via Server-Sent Events. The client processes events through a single `processEvent` function that calls Zustand actions directly via `getState()` — no subscriptions, no re-renders from token events. +The consensus engine runs entirely server-side. Each round streams responses via Server-Sent Events. The client processes events through a single `processEvent` function that calls Zustand actions directly via `getState()` — no subscriptions, no re-renders from token events. The same event pipeline drives the confidence trajectory, the disagreement ledger, the cost meter, and the judge card — every panel reads from one coherent store. --- @@ -395,16 +448,16 @@ The new persona will appear in every selector automatically. ## Roadmap -RoundTable currently ships with the **Consensus Validation Protocol (CVP)** engine. The architecture is designed to support multiple consensus strategies — future releases will introduce additional engines: +RoundTable ships with two engines today. The architecture is designed to support more: -| Engine | Status | Description | -| --------------------------------------- | --------- | ---------------------------------------------------------------------------------------------------- | -| **CVP (Consensus Validation Protocol)** | Available | Multi-round structured debate: Analysis, Counterarguments, Evidence Assessment, Synthesis | -| **Delphi Method** | Planned | Anonymous multi-round forecasting with statistical aggregation between rounds | -| **Adversarial Red Team** | Planned | One model attacks, others defend — iterative stress-testing of ideas | -| **Ranked Choice Synthesis** | Planned | Each model proposes solutions, then ranks all proposals — converges via elimination | -| **Dialectical Engine** | Planned | Thesis / Antithesis / Synthesis structure with formal argument mapping | -| **Blind Jury** | Planned | Models respond independently with no visibility into each other's answers, then a synthesizer merges | +| Engine | Status | Description | +| --------------------------------------- | --------- | ---------------------------------------------------------------------------------------------- | +| **CVP (Consensus Validation Protocol)** | Available | Multi-round structured debate with blind Round 1, randomized order, early stop, optional judge | +| **Blind Jury** | Available | Parallel independent responses with no cross-visibility, followed by a judge synthesis | +| **Delphi Method** | Planned | Anonymous multi-round forecasting with statistical aggregation between rounds | +| **Adversarial Red Team** | Planned | One model attacks, others defend — iterative stress-testing of ideas | +| **Ranked Choice Synthesis** | Planned | Each model proposes solutions, then ranks all proposals — converges via elimination | +| **Dialectical Engine** | Planned | Thesis / Antithesis / Synthesis structure with formal argument mapping | The consensus engine is a single file (`lib/consensus-engine.ts`) with a clean interface — contributions for new engines are welcome. diff --git a/app/api/consensus/route.ts b/app/api/consensus/route.ts index 6da1bcc..39362c9 100644 --- a/app/api/consensus/route.ts +++ b/app/api/consensus/route.ts @@ -3,10 +3,11 @@ // Security hardening: // - Server-side limits on prompt size, participant count, round count // - Personas are server-rebuilt from persona IDs (client systemPrompts ignored) +// - Engine options are validated and clamped server-side // - Request abort signal is forwarded to the engine // - Basic rate limiting via in-memory sliding window -import type { ConsensusRequest, ConsensusEvent } from "@/lib/types"; +import type { ConsensusEvent, ConsensusOptions, EngineType, Participant } from "@/lib/types"; import { runConsensus } from "@/lib/consensus-engine"; import { getPersona } from "@/lib/personas"; import { findResolvedModel } from "@/lib/providers"; @@ -46,6 +47,46 @@ setInterval(() => { } }, RATE_WINDOW_MS); +// ── Options parsing & validation ─────────────────────────── + +interface LooseRequestBody { + prompt?: unknown; + participants?: unknown; + rounds?: unknown; // legacy + options?: unknown; +} + +function parseEngine(v: unknown): EngineType { + return v === "blind-jury" ? "blind-jury" : "cvp"; +} + +function parseBool(v: unknown, fallback: boolean): boolean { + return typeof v === "boolean" ? v : fallback; +} + +function parseOptions(body: LooseRequestBody): ConsensusOptions { + const raw = (body.options ?? {}) as Record; + const legacyRounds = typeof body.rounds === "number" ? body.rounds : undefined; + const requestedRounds = typeof raw.rounds === "number" ? raw.rounds : (legacyRounds ?? 5); + + const rounds = Math.min(Math.max(1, Math.floor(requestedRounds)), MAX_ROUNDS); + + const judgeModelId = + typeof raw.judgeModelId === "string" && raw.judgeModelId.length > 0 + ? (raw.judgeModelId as string) + : undefined; + + return { + engine: parseEngine(raw.engine), + rounds, + randomizeOrder: parseBool(raw.randomizeOrder, true), + blindFirstRound: parseBool(raw.blindFirstRound, true), + earlyStop: parseBool(raw.earlyStop, true), + judgeEnabled: parseBool(raw.judgeEnabled, false), + judgeModelId, + }; +} + // ── Route handler ────────────────────────────────────────── export async function POST(request: Request) { @@ -62,11 +103,22 @@ export async function POST(request: Request) { }); } - const body = (await request.json()) as ConsensusRequest; + const body = (await request.json()) as LooseRequestBody; // ── Validation ─────────────────────────────────────────── - if (!body.prompt || !body.participants?.length || !body.rounds) { + const hasRounds = + body.rounds !== undefined || + (typeof body.options === "object" && + body.options !== null && + "rounds" in (body.options as Record)); + + if ( + !body.prompt || + !Array.isArray(body.participants) || + body.participants.length === 0 || + !hasRounds + ) { return new Response(JSON.stringify({ error: "Missing required fields" }), { status: 400, headers: { "Content-Type": "application/json" }, @@ -80,40 +132,42 @@ export async function POST(request: Request) { ); } - if (!Array.isArray(body.participants) || body.participants.length > MAX_PARTICIPANTS) { + if (body.participants.length > MAX_PARTICIPANTS) { return new Response( JSON.stringify({ error: `Maximum ${MAX_PARTICIPANTS} participants allowed` }), { status: 400, headers: { "Content-Type": "application/json" } }, ); } - const rounds = Math.min(Math.max(1, Math.floor(body.rounds)), MAX_ROUNDS); + const options = parseOptions(body); // ── Rebuild participants server-side ───────────────────── // Never trust client-supplied systemPrompts or arbitrary model IDs. // Re-resolve models and personas from their IDs. - const validatedParticipants: Array<{ - id: string; - modelInfo: { id: string; providerId: string; providerName: string; modelId: string }; - persona: ReturnType; - }> = []; - for (const p of body.participants) { - const resolved = findResolvedModel(p.modelInfo?.id ?? ""); + const validatedParticipants: Participant[] = []; + for (const p of body.participants as Array<{ + id?: unknown; + modelInfo?: { id?: unknown }; + persona?: { id?: unknown }; + }>) { + const modelCompositeId = typeof p.modelInfo?.id === "string" ? p.modelInfo.id : ""; + const resolved = findResolvedModel(modelCompositeId); if (!resolved) { - return new Response(JSON.stringify({ error: `Model not available: ${p.modelInfo?.id}` }), { + return new Response(JSON.stringify({ error: `Model not available: ${modelCompositeId}` }), { status: 400, headers: { "Content-Type": "application/json" }, }); } // Rebuild persona from server-side definitions (ignore client systemPrompt) - const persona = getPersona(p.persona?.id ?? ""); + const personaId = typeof p.persona?.id === "string" ? p.persona.id : ""; + const persona = getPersona(personaId); validatedParticipants.push({ - id: p.id, + id: typeof p.id === "string" ? p.id : `p-${validatedParticipants.length + 1}`, modelInfo: { - id: p.modelInfo.id, + id: modelCompositeId, providerId: resolved.providerId, providerName: resolved.providerName, modelId: resolved.modelId, @@ -122,6 +176,23 @@ export async function POST(request: Request) { }); } + // Validate judge model, if requested + if (options.judgeEnabled) { + if (!options.judgeModelId) { + return new Response( + JSON.stringify({ error: "Judge enabled but no judgeModelId was supplied" }), + { status: 400, headers: { "Content-Type": "application/json" } }, + ); + } + const judgeResolved = findResolvedModel(options.judgeModelId); + if (!judgeResolved) { + return new Response( + JSON.stringify({ error: `Judge model not available: ${options.judgeModelId}` }), + { status: 400, headers: { "Content-Type": "application/json" } }, + ); + } + } + // ── Stream with abort support ──────────────────────────── const encoder = new TextEncoder(); @@ -139,9 +210,9 @@ export async function POST(request: Request) { try { await runConsensus( - body.prompt, + body.prompt as string, validatedParticipants, - rounds, + options, emit, request.signal, // forward abort signal ); diff --git a/app/page.tsx b/app/page.tsx index 42143c2..b3aa23c 100644 --- a/app/page.tsx +++ b/app/page.tsx @@ -1,7 +1,7 @@ "use client"; // ───────────────────────────────────────────────────────────── -// Consensus Arena — Main Page (Clean Dashboard) +// RoundTable — Main Page (Clean Dashboard) // ───────────────────────────────────────────────────────────── import { useEffect, useCallback, useState } from "react"; @@ -10,6 +10,11 @@ import AISelector from "@/components/AISelector"; import ResultPanel from "@/components/ResultPanel"; import MessageFlowDiagram from "@/components/MessageFlowDiagram"; import BackToTop from "@/components/BackToTop"; +import ConfidenceTrajectory from "@/components/ConfidenceTrajectory"; +import DisagreementPanel from "@/components/DisagreementPanel"; +import CostMeter from "@/components/CostMeter"; +import ConfigPanel from "@/components/ConfigPanel"; +import PromptLibrary from "@/components/PromptLibrary"; import { toast } from "sonner"; import { Play, @@ -22,17 +27,20 @@ import { Users, ArrowRight, Sparkles, + Eye, } from "lucide-react"; import type { ConsensusEvent, ConsensusRequest } from "@/lib/types"; +import { decodeSnapshotFromHash } from "@/lib/session"; export default function HomePage() { const participants = useArenaStore((s) => s.participants); - const roundCount = useArenaStore((s) => s.roundCount); const prompt = useArenaStore((s) => s.prompt); + const options = useArenaStore((s) => s.options); const isRunning = useArenaStore((s) => s.isRunning); const currentRound = useArenaStore((s) => s.currentRound); const progress = useArenaStore((s) => s.progress); const finalScore = useArenaStore((s) => s.finalScore); + const sharedView = useArenaStore((s) => s.sharedView); const setAvailableModels = useArenaStore((s) => s.setAvailableModels); const setModelsLoading = useArenaStore((s) => s.setModelsLoading); @@ -40,6 +48,7 @@ export default function HomePage() { const setPrompt = useArenaStore((s) => s.setPrompt); const cancelConsensus = useArenaStore((s) => s.cancelConsensus); const reset = useArenaStore((s) => s.reset); + const loadSnapshot = useArenaStore((s) => s.loadSnapshot); const [showOnboarding, setShowOnboarding] = useState(true); @@ -67,6 +76,18 @@ export default function HomePage() { }); }, [setAvailableModels, setModelsLoading]); + // Load a shared snapshot from the URL hash, if present + useEffect(() => { + if (typeof window === "undefined") return; + if (!window.location.hash) return; + decodeSnapshotFromHash(window.location.hash).then((snap) => { + if (!snap) return; + loadSnapshot(snap); + toast.info("Viewing shared session"); + setShowOnboarding(false); + }); + }, [loadSnapshot]); + useEffect(() => { const handler = (e: KeyboardEvent) => { if (e.key === "Escape" && useArenaStore.getState().isRunning) { @@ -88,6 +109,15 @@ export default function HomePage() { toast.error("Add at least 2 AI participants"); return; } + if (state.options.judgeEnabled && !state.options.judgeModelId) { + toast.error("Choose a judge model or disable judge synthesis"); + return; + } + + // Clear any URL hash from a previously loaded shared view + if (typeof window !== "undefined" && window.location.hash) { + history.replaceState(null, "", window.location.pathname); + } const controller = state.startConsensus(); toast.info("Consensus started — Esc to cancel"); @@ -95,7 +125,7 @@ export default function HomePage() { const body: ConsensusRequest = { prompt: state.prompt.trim(), participants: state.participants, - rounds: state.roundCount, + options: state.options, }; try { @@ -130,21 +160,44 @@ export default function HomePage() { if (err instanceof DOMException && err.name === "AbortError") return; const msg = err instanceof Error ? err.message : "Unknown error"; toast.error(`Consensus failed: ${msg}`); - useArenaStore.getState().completeConsensus(0, `Error: ${msg}`); + useArenaStore.getState().completeConsensus(0, `Error: ${msg}`, 0); } }, []); - const canRun = !isRunning && prompt.trim().length > 0 && participants.length >= 2; + const canRun = !isRunning && !sharedView && prompt.trim().length > 0 && participants.length >= 2; const handleCancel = useCallback(() => { cancelConsensus(); toast.info("Consensus cancelled"); }, [cancelConsensus]); + const handleLeaveSharedView = useCallback(() => { + if (typeof window !== "undefined" && window.location.hash) { + history.replaceState(null, "", window.location.pathname); + } + reset(); + }, [reset]); + return (
+ {/* Shared-view banner */} + {sharedView && ( +
+
+ + Viewing a shared session. Reset to start your own run. +
+ +
+ )} + {/* Onboarding */} - {showOnboarding && participants.length === 0 && ( + {showOnboarding && participants.length === 0 && !sharedView && (
setShowOnboarding(false)} @@ -211,18 +264,18 @@ export default function HomePage() {
- {roundCount} + {options.engine === "blind-jury" ? 1 : options.rounds}
+
+

+ Protocol +

+ +
+ + {/* On xl+ these panels move into the floating Message Flow + container on the right. Keep them in the sidebar below + that breakpoint so smaller screens still see them. */} +
+ + + +
+ {isRunning && (

- Round {currentRound} of {roundCount} + Round {currentRound} of {options.rounds}

+ ); +} + +export default function ConfigPanel() { + const options = useArenaStore((s) => s.options); + const setOption = useArenaStore((s) => s.setOption); + const availableModels = useArenaStore((s) => s.availableModels); + const isRunning = useArenaStore((s) => s.isRunning); + + const [judgeOpen, setJudgeOpen] = useState(false); + const judgeRef = useRef(null); + + useEffect(() => { + const onClick = (e: MouseEvent) => { + if (judgeRef.current && !judgeRef.current.contains(e.target as Node)) setJudgeOpen(false); + }; + document.addEventListener("mousedown", onClick); + return () => document.removeEventListener("mousedown", onClick); + }, []); + + const judgeModel = useMemo( + () => availableModels.find((m) => m.id === options.judgeModelId), + [availableModels, options.judgeModelId], + ); + + const isCvp = options.engine === "cvp"; + + return ( +
+
+

+ Engine +

+
+ {(["cvp", "blind-jury"] as const).map((eng) => { + const active = options.engine === eng; + return ( + + ); + })} +
+

+ {isCvp + ? "Multi-round structured debate with cross-visibility." + : "One-shot parallel responses + judge synthesis."} +

+
+ + {isCvp && ( +
+ } + checked={options.randomizeOrder} + onChange={(v) => setOption("randomizeOrder", v)} + disabled={isRunning} + /> + } + checked={options.blindFirstRound} + onChange={(v) => setOption("blindFirstRound", v)} + disabled={isRunning} + /> + } + checked={options.earlyStop} + onChange={(v) => setOption("earlyStop", v)} + disabled={isRunning} + /> +
+ )} + +
+ } + checked={options.judgeEnabled} + onChange={(v) => { + setOption("judgeEnabled", v); + if (v && !options.judgeModelId && availableModels[0]) { + setOption("judgeModelId", availableModels[0].id); + } + }} + disabled={isRunning} + /> + {options.judgeEnabled && ( +
+ + {judgeOpen && ( +
+ {availableModels.length === 0 && ( +

No models available.

+ )} + {availableModels.map((m) => { + const isSel = options.judgeModelId === m.id; + return ( + + ); + })} +
+ )} +
+ )} +
+
+ ); +} diff --git a/components/CostMeter.tsx b/components/CostMeter.tsx new file mode 100644 index 0000000..155f23a --- /dev/null +++ b/components/CostMeter.tsx @@ -0,0 +1,52 @@ +"use client"; + +// ───────────────────────────────────────────────────────────── +// Cost Meter — Live token usage & estimated cost +// ───────────────────────────────────────────────────────────── +// Shows running total tokens and estimated USD cost. Pricing is +// client-side and clearly flagged as an estimate in the label. + +import { useArenaStore } from "@/lib/store"; +import { DollarSign } from "lucide-react"; + +function formatCost(usd: number): string { + if (usd === 0) return "$0.00"; + if (usd < 0.01) return `$${usd.toFixed(4)}`; + return `$${usd.toFixed(2)}`; +} + +function formatTokens(n: number): string { + if (n < 1000) return n.toString(); + if (n < 1_000_000) return `${(n / 1000).toFixed(1)}K`; + return `${(n / 1_000_000).toFixed(2)}M`; +} + +export default function CostMeter() { + const total = useArenaStore((s) => s.tokenTotal); + const isRunning = useArenaStore((s) => s.isRunning); + + if (total.totalTokens === 0 && !isRunning) return null; + + return ( +
+
+ +

+ Cost (estimated) +

+
+
+ + {formatCost(total.estimatedCostUSD)} + + + {formatTokens(total.totalTokens)} tokens + +
+
+ in {formatTokens(total.inputTokens)} + out {formatTokens(total.outputTokens)} +
+
+ ); +} diff --git a/components/DisagreementPanel.tsx b/components/DisagreementPanel.tsx new file mode 100644 index 0000000..ec9fee0 --- /dev/null +++ b/components/DisagreementPanel.tsx @@ -0,0 +1,95 @@ +"use client"; + +// ───────────────────────────────────────────────────────────── +// Disagreement Panel — Live disagreement ledger +// ───────────────────────────────────────────────────────────── +// Renders the list of disagreements detected during the run. +// Groups by round, links each row to the round heading, and +// shows severity as a small bar. Reads straight from the store. + +import { useArenaStore } from "@/lib/store"; +import { useMemo } from "react"; +import { AlertTriangle } from "lucide-react"; + +export default function DisagreementPanel() { + const disagreements = useArenaStore((s) => s.disagreements); + const participants = useArenaStore((s) => s.participants); + + const grouped = useMemo(() => { + const out = new Map(); + for (const d of disagreements) { + const list = out.get(d.round) ?? []; + list.push(d); + out.set(d.round, list); + } + return [...out.entries()].sort((a, b) => a[0] - b[0]); + }, [disagreements]); + + if (disagreements.length === 0) return null; + + const lookup = (id: string) => participants.find((p) => p.id === id); + + const scrollToRound = (round: number) => { + const el = document.getElementById(`round-${round}`); + if (el) el.scrollIntoView({ behavior: "smooth", block: "start" }); + }; + + return ( +
+
+ +

+ Disagreement Ledger +

+ + {disagreements.length} + +
+
+ {grouped.map(([round, items]) => ( +
+ + {items.map((d) => { + const a = lookup(d.participantAId); + const b = lookup(d.participantBId); + return ( + + ); + })} +
+ ))} +
+
+ ); +} diff --git a/components/JudgeCard.tsx b/components/JudgeCard.tsx new file mode 100644 index 0000000..485f272 --- /dev/null +++ b/components/JudgeCard.tsx @@ -0,0 +1,46 @@ +"use client"; + +// ───────────────────────────────────────────────────────────── +// Judge Card — Non-voting synthesizer output +// ───────────────────────────────────────────────────────────── +// Renders the live judge stream (if running) or the final +// synthesis (if complete). Does nothing if the judge was never +// enabled for the run. + +import { useArenaStore } from "@/lib/store"; +import ReactMarkdown from "react-markdown"; +import remarkGfm from "remark-gfm"; +import { Gavel, Loader2 } from "lucide-react"; + +const remarkPlugins = [remarkGfm]; + +export default function JudgeCard() { + const judge = useArenaStore((s) => s.judge); + const stream = useArenaStore((s) => s.judgeStream); + const running = useArenaStore((s) => s.judgeRunning); + + if (!judge && !running) return null; + + const content = running ? stream : (judge?.content ?? ""); + const displayContent = content.replace(/\nJUDGE_CONFIDENCE:\s*\d+\s*$/i, "").trim(); + + return ( +
+
+ +
+

Consensus Judge

+ {judge && ( +

+ {judge.providerName} · {judge.modelId} +

+ )} +
+ {running && } +
+
+ {displayContent || "..."} +
+
+ ); +} diff --git a/components/MessageFlowDiagram.tsx b/components/MessageFlowDiagram.tsx index 0f0df8f..983fa87 100644 --- a/components/MessageFlowDiagram.tsx +++ b/components/MessageFlowDiagram.tsx @@ -1,8 +1,14 @@ "use client"; // ───────────────────────────────────────────────────────────── -// Message Flow Diagram — High-contrast floating sidebar +// Message Flow Diagram — Floating right-side summary panel // ───────────────────────────────────────────────────────────── +// Stacks the cost meter, confidence trajectory, disagreement +// ledger and the UML-style message flow diagram into a single +// scrollable floating container pinned to the right edge of the +// viewport on xl+ screens. Each card returns null on its own +// when it has nothing to show, so the stack collapses naturally +// as the run starts filling in data. import { useArenaStore } from "@/lib/store"; import { useMemo, useState, memo, useCallback } from "react"; @@ -15,6 +21,9 @@ import { CheckCircle2, Loader2, } from "lucide-react"; +import CostMeter from "./CostMeter"; +import ConfidenceTrajectory from "./ConfidenceTrajectory"; +import DisagreementPanel from "./DisagreementPanel"; function scrollToResponse(responseId: string) { const el = document.getElementById(responseId); @@ -33,6 +42,33 @@ function scrollToRound(roundNumber: number) { } export default function MessageFlowDiagram() { + const rounds = useArenaStore((s) => s.rounds); + const isRunning = useArenaStore((s) => s.isRunning); + const tokenTotal = useArenaStore((s) => s.tokenTotal); + const disagreements = useArenaStore((s) => s.disagreements); + + const hasRounds = rounds.length > 0; + const hasCost = tokenTotal.totalTokens > 0 || isRunning; + const hasDisagreements = disagreements.length > 0; + + // Hide the entire floating container when there's nothing to show. + if (!hasRounds && !hasCost && !hasDisagreements) return null; + + return ( +
+
+ + + + +
+
+ ); +} + +// ── Message Flow card — the collapsible sequence diagram ── + +function MessageFlowCard() { const rounds = useArenaStore((s) => s.rounds); const participants = useArenaStore((s) => s.participants); const currentRound = useArenaStore((s) => s.currentRound); @@ -75,56 +111,54 @@ export default function MessageFlowDiagram() { if (rounds.length === 0) return null; return ( -
-
- +
+ - {!collapsed && ( -
- {/* Participant badges */} -
- {participants.map((p) => ( + {!collapsed && ( +
+ {/* Participant badges */} +
+ {participants.map((p) => ( +
-
- {p.modelInfo.modelId} -
- ))} -
- - {flowEvents.map((event, i) => ( - - ))} - - {isRunning && ( -
- - Round {currentRound}... + className="w-1.5 h-1.5 rounded-full" + style={{ backgroundColor: p.persona.color }} + /> + {p.modelInfo.modelId}
- )} + ))}
- )} -
+ + {flowEvents.map((event, i) => ( + + ))} + + {isRunning && ( +
+ + Round {currentRound}... +
+ )} +
+ )}
); } diff --git a/components/PromptLibrary.tsx b/components/PromptLibrary.tsx new file mode 100644 index 0000000..9a5dc01 --- /dev/null +++ b/components/PromptLibrary.tsx @@ -0,0 +1,45 @@ +"use client"; + +// ───────────────────────────────────────────────────────────── +// Prompt Library — Preset chips under the prompt textarea +// ───────────────────────────────────────────────────────────── + +import { PROMPT_LIBRARY } from "@/lib/prompt-library"; +import { useArenaStore } from "@/lib/store"; +import { Sparkles } from "lucide-react"; + +export default function PromptLibrary() { + const setPrompt = useArenaStore((s) => s.setPrompt); + const isRunning = useArenaStore((s) => s.isRunning); + const prompt = useArenaStore((s) => s.prompt); + + if (isRunning) return null; + if (prompt.trim().length > 0) return null; + + return ( +
+

+ + Try a preset +

+
+ {PROMPT_LIBRARY.map((preset) => ( + + ))} +
+
+ ); +} diff --git a/components/ResultPanel.tsx b/components/ResultPanel.tsx index 6ca2d12..772cc4d 100644 --- a/components/ResultPanel.tsx +++ b/components/ResultPanel.tsx @@ -8,6 +8,8 @@ import { useArenaStore } from "@/lib/store"; import ReactMarkdown from "react-markdown"; import remarkGfm from "remark-gfm"; import { useEffect, useRef, memo, useCallback, useState } from "react"; +import JudgeCard from "./JudgeCard"; +import SessionMenu from "./SessionMenu"; import { CheckCircle, Circle, @@ -18,6 +20,8 @@ import { ChevronUp, Copy, Check, + ZapOff, + AlertCircle, } from "lucide-react"; const remarkPlugins = [remarkGfm]; @@ -31,7 +35,10 @@ export default function ResultPanel() { const finalSummary = useArenaStore((s) => s.finalSummary); const progress = useArenaStore((s) => s.progress); const cancelConsensus = useArenaStore((s) => s.cancelConsensus); - const roundCount = useArenaStore((s) => s.roundCount); + const roundCount = useArenaStore((s) => s.options.rounds); + const earlyStopped = useArenaStore((s) => s.earlyStopped); + const judge = useArenaStore((s) => s.judge); + const judgeRunning = useArenaStore((s) => s.judgeRunning); const bottomRef = useRef(null); const scrollTimerRef = useRef | null>(null); @@ -57,7 +64,7 @@ export default function ResultPanel() { window.scrollTo({ top: 0, behavior: "smooth" }); }, []); - if (rounds.length === 0 && !isRunning) return null; + if (rounds.length === 0 && !isRunning && !judge && !judgeRunning) return null; return (
@@ -137,6 +144,7 @@ export default function ResultPanel() { personaEmoji={participant.persona.emoji} confidence={response.confidence} content={response.content} + error={response.error} /> ); })} @@ -148,6 +156,22 @@ export default function ResultPanel() {
))} + {/* Early stop notice */} + {earlyStopped && ( +
+ +
+

+ Stopped early after round {earlyStopped.round} +

+

{earlyStopped.reason}

+
+
+ )} + + {/* Judge synthesis */} + + {/* Final consensus */} {finalScore !== null && (
@@ -159,6 +183,7 @@ export default function ResultPanel() { {finalScore}% +
{finalSummary && (

{finalSummary}

@@ -231,6 +256,7 @@ const CompletedResponseCard = memo(function CompletedResponseCard({ personaEmoji, confidence, content, + error, }: { responseId: string; modelName: string; @@ -240,6 +266,7 @@ const CompletedResponseCard = memo(function CompletedResponseCard({ personaEmoji: string; confidence: number; content: string; + error?: string; }) { const displayContent = content.replace(/\nCONFIDENCE:\s*\d+\s*$/i, "").trim(); const [copied, setCopied] = useState(false); @@ -251,6 +278,37 @@ const CompletedResponseCard = memo(function CompletedResponseCard({ }); }, [displayContent]); + if (error) { + return ( +
+
+
+ +
+

Provider error

+

{error}

+

+ Check your provider base URL, API key, and that the model ID exists at the upstream + endpoint. This participant's response is excluded from the consensus score. +

+
+
+
+ ); + } + return (
@@ -357,7 +417,12 @@ const Header = memo(function Header({
{streaming && } - {confidence != null && ( + {errored && ( + + ERROR + + )} + {!errored && confidence != null && ( s.getSnapshot); + const finalScore = useArenaStore((s) => s.finalScore); + const [open, setOpen] = useState(false); + const [copied, setCopied] = useState(false); + const ref = useRef(null); + + useEffect(() => { + const onClick = (e: MouseEvent) => { + if (ref.current && !ref.current.contains(e.target as Node)) setOpen(false); + }; + document.addEventListener("mousedown", onClick); + return () => document.removeEventListener("mousedown", onClick); + }, []); + + const handleMarkdown = useCallback(() => { + const snap = getSnapshot(); + downloadBlob(snapshotFilename(snap, "md"), snapshotToMarkdown(snap), "text/markdown"); + toast.success("Markdown downloaded"); + setOpen(false); + }, [getSnapshot]); + + const handleJSON = useCallback(() => { + const snap = getSnapshot(); + downloadBlob(snapshotFilename(snap, "json"), snapshotToJSON(snap), "application/json"); + toast.success("JSON downloaded"); + setOpen(false); + }, [getSnapshot]); + + const handleLink = useCallback(async () => { + const snap = getSnapshot(); + try { + const encoded = await encodeSnapshotToHash(snap); + const url = `${window.location.origin}${window.location.pathname}#${encoded}`; + await navigator.clipboard.writeText(url); + setCopied(true); + toast.success("Permalink copied to clipboard"); + setTimeout(() => setCopied(false), 2000); + } catch { + toast.error("Failed to build permalink"); + } + setOpen(false); + }, [getSnapshot]); + + if (finalScore === null) return null; + + return ( +
+ + {open && ( +
+ + +
+ +
+ )} +
+ ); +} + +/** Stand-alone icon button for the "Download" UX without a menu */ +export function DownloadIconButton() { + const getSnapshot = useArenaStore((s) => s.getSnapshot); + const finalScore = useArenaStore((s) => s.finalScore); + + const handleClick = useCallback(() => { + const snap = getSnapshot(); + downloadBlob(snapshotFilename(snap, "md"), snapshotToMarkdown(snap), "text/markdown"); + toast.success("Markdown downloaded"); + }, [getSnapshot]); + + if (finalScore === null) return null; + + return ( + + ); +} diff --git a/lib/consensus-engine.ts b/lib/consensus-engine.ts index 5ab2696..475986b 100644 --- a/lib/consensus-engine.ts +++ b/lib/consensus-engine.ts @@ -1,16 +1,38 @@ // ───────────────────────────────────────────────────────────── // RoundTable — Consensus Engine (Server-side) // ───────────────────────────────────────────────────────────── -// Orchestrates multi-round, multi-AI consensus using SSE streaming. -// Accepts an optional AbortSignal to stop processing when the -// client disconnects. +// Orchestrates multi-round, multi-AI consensus using SSE +// streaming. Dispatches between engines and wires in the +// optional Judge synthesizer and cost meter. +// +// Engines: +// cvp — Consensus Validation Protocol (multi-round debate) +// blind-jury — Parallel independent responses + judge synthesis +// +// All engines accept an optional AbortSignal and forward it to +// every provider call. import { createOpenAI } from "@ai-sdk/openai"; import { streamText } from "ai"; -import type { Participant, RoundType, ConsensusEvent, RoundResponse } from "./types"; +import type { + Participant, + RoundType, + ConsensusEvent, + RoundResponse, + ConsensusOptions, + Disagreement, + JudgeResult, + TokenUsage, +} from "./types"; import { findResolvedModel } from "./providers"; +import { JUDGE_PERSONA } from "./personas"; +import { addUsage, estimateCost, estimateUsageFromText, ZERO_USAGE } from "./pricing"; + +const MAX_OUTPUT_TOKENS = 1500; +const EARLY_STOP_DELTA_THRESHOLD = 3; // consensus score delta below this = converged + +// ── Round definitions ────────────────────────────────────── -/** Round definitions per the CVP spec */ function getRoundMeta( roundNumber: number, totalRounds: number, @@ -27,7 +49,8 @@ function getRoundMeta( }; } -/** Build the system prompt for a specific round */ +// ── Prompt building ──────────────────────────────────────── + function buildRoundSystemPrompt( persona: string, roundType: RoundType, @@ -63,34 +86,285 @@ IMPORTANT: End your response with a line in exactly this format: CONFIDENCE: [number 0-100]`; } -/** Extract confidence score from response text */ +function buildBlindJurorSystemPrompt(persona: string): string { + return `${persona} + +You are participating in a BLIND JURY. Every juror is answering the prompt independently and simultaneously. You have no visibility into other jurors' responses. Give your most complete, considered analysis now — there is no second round. + +IMPORTANT: End your response with a line in exactly this format: +CONFIDENCE: [number 0-100]`; +} + +function buildJudgeContext(finalResponses: RoundResponse[], participants: Participant[]): string { + const blocks = finalResponses.map((r) => { + const p = participants.find((x) => x.id === r.participantId); + const label = p + ? `${p.persona.name} (${p.modelInfo.providerName}/${p.modelInfo.modelId})` + : r.participantId; + const body = r.content.replace(/\nCONFIDENCE:\s*\d+\s*$/i, "").trim(); + return `### ${label} — self-reported confidence ${r.confidence}%\n${body}`; + }); + return `Below are the final-round responses from every participant. Synthesize them per your instructions.\n\n${blocks.join("\n\n---\n\n")}`; +} + +// ── Extraction helpers ───────────────────────────────────── + +/** Extract confidence score from response text (0-100, defaults to 50) */ function extractConfidence(text: string): number { const match = text.match(/CONFIDENCE:\s*(\d+)/i); if (match) return Math.min(100, Math.max(0, parseInt(match[1], 10))); return 50; } -/** Calculate consensus score from participant confidences */ +function extractJudgeSection(text: string, heading: string): string { + const pattern = new RegExp(`##\\s*${heading}\\s*\\n([\\s\\S]*?)(?=\\n##\\s|$)`, "i"); + const m = text.match(pattern); + return m ? m[1].trim() : ""; +} + +// ── Scoring ──────────────────────────────────────────────── + function calculateConsensusScore(responses: RoundResponse[]): number { - if (responses.length === 0) return 0; - const avg = responses.reduce((sum, r) => sum + r.confidence, 0) / responses.length; + const valid = responses.filter((r) => !r.error); + if (valid.length === 0) return 0; + const avg = valid.reduce((sum, r) => sum + r.confidence, 0) / valid.length; const variance = - responses.reduce((sum, r) => sum + Math.pow(r.confidence - avg, 2), 0) / responses.length; + valid.reduce((sum, r) => sum + Math.pow(r.confidence - avg, 2), 0) / valid.length; const stdDev = Math.sqrt(variance); return Math.round(Math.max(0, Math.min(100, avg - stdDev * 0.5))); } -/** Stream a single AI participant's response */ +// ── Randomization ────────────────────────────────────────── + +/** Fisher–Yates shuffle. Non-mutating. */ +export function shuffle(arr: readonly T[], rng: () => number = Math.random): T[] { + const out = arr.slice(); + for (let i = out.length - 1; i > 0; i--) { + const j = Math.floor(rng() * (i + 1)); + [out[i], out[j]] = [out[j], out[i]]; + } + return out; +} + +// ── Disagreement detection ───────────────────────────────── + +/** + * Detect disagreements in a round using a deterministic, text-free + * heuristic: pairs of participants whose confidence scores diverge + * by >= 20 points. This avoids fragile regex claim-extraction and + * extra LLM calls, while still giving a meaningful signal. + */ +export function detectDisagreements( + round: number, + responses: RoundResponse[], + participants: Participant[], +): Disagreement[] { + const out: Disagreement[] = []; + for (let i = 0; i < responses.length; i++) { + for (let j = i + 1; j < responses.length; j++) { + const a = responses[i]; + const b = responses[j]; + if (a.error || b.error) continue; + const delta = Math.abs(a.confidence - b.confidence); + if (delta < 20) continue; + const pa = participants.find((p) => p.id === a.participantId); + const pb = participants.find((p) => p.id === b.participantId); + const label = pa && pb ? `${pa.persona.name} vs ${pb.persona.name}` : "Confidence split"; + out.push({ + id: `r${round}-${a.participantId}-${b.participantId}`, + round, + participantAId: a.participantId, + participantBId: b.participantId, + severity: delta, + label, + }); + } + } + return out; +} + +// ── Streaming a single participant ───────────────────────── + +/** Safely extract token usage from a streamText result */ +async function extractUsage( + result: { usage?: unknown } | undefined, +): Promise<{ inputTokens: number; outputTokens: number } | null> { + if (!result || !result.usage) return null; + try { + const u = (await (result.usage as Promise)) as Record | undefined; + if (!u || typeof u !== "object") return null; + const input = (u.inputTokens ?? u.promptTokens ?? 0) as number; + const output = (u.outputTokens ?? u.completionTokens ?? 0) as number; + if (typeof input !== "number" || typeof output !== "number") return null; + return { inputTokens: input, outputTokens: output }; + } catch { + return null; + } +} + +/** + * Format a provider error for display. Pulls out the common fields + * that the Vercel AI SDK attaches to `AI_APICallError` (statusCode, + * url, responseBody) and falls back to the message when they are + * absent. + */ +function formatProviderError(err: unknown): string { + if (!(err instanceof Error)) return typeof err === "string" ? err : "Unknown provider error"; + const msg = err.message || err.name || "Unknown provider error"; + const info = err as unknown as { + statusCode?: number; + responseBody?: unknown; + url?: string; + }; + const parts: string[] = [msg]; + if (typeof info.statusCode === "number") parts.push(`HTTP ${info.statusCode}`); + return parts.join(" — "); +} + async function streamParticipant( participant: Participant, systemPrompt: string, userPrompt: string, + round: number, emit: (event: ConsensusEvent) => void, signal?: AbortSignal, ): Promise { + emit({ type: "participant-start", participantId: participant.id, round }); + + const started = Date.now(); + let fullContent = ""; + let usage: TokenUsage | undefined; + let errorMessage: string | undefined; + const resolved = findResolvedModel(participant.modelInfo.id); + + if (!resolved) { + errorMessage = `Model not available: ${participant.modelInfo.id}`; + fullContent = `[Error from ${participant.modelInfo.providerName} / ${participant.modelInfo.modelId}: ${errorMessage}]`; + emit({ type: "token", participantId: participant.id, round, token: fullContent }); + console.error(`[RoundTable] Model resolution failed for ${participant.modelInfo.id}`); + } else { + try { + const provider = createOpenAI({ + baseURL: resolved.baseUrl, + apiKey: resolved.apiKey, + }); + + // Vercel AI SDK v6 surfaces provider errors via the `onError` + // callback rather than throwing from `textStream`. We capture + // them here and re-throw after the iteration so the outer + // try/catch handles them uniformly. + let capturedError: unknown = null; + + // Use `.chat()` — the OpenAI chat-completions endpoint + // (`/v1/chat/completions`) is the only one every provider's + // OpenAI-compat shim implements (Anthropic, xAI, Mistral, + // Groq, Together, …). The default `provider(modelId)` call + // would target `/v1/responses`, which is OpenAI-only. + const result = streamText({ + model: provider.chat(resolved.modelId), + system: systemPrompt, + prompt: userPrompt, + maxOutputTokens: MAX_OUTPUT_TOKENS, + temperature: 0.7, + abortSignal: signal, + onError: ({ error }: { error: unknown }) => { + capturedError = error; + }, + } as Parameters[0]); + + const awaited = await result; + for await (const chunk of awaited.textStream) { + if (signal?.aborted) throw new DOMException("Aborted", "AbortError"); + fullContent += chunk; + emit({ type: "token", participantId: participant.id, round, token: chunk }); + } + + if (capturedError) throw capturedError; + + const rawUsage = await extractUsage(awaited as { usage?: unknown }); + if (rawUsage) { + usage = { + inputTokens: rawUsage.inputTokens, + outputTokens: rawUsage.outputTokens, + totalTokens: rawUsage.inputTokens + rawUsage.outputTokens, + estimatedCostUSD: estimateCost( + resolved.modelId, + rawUsage.inputTokens, + rawUsage.outputTokens, + ), + }; + } else { + usage = estimateUsageFromText(resolved.modelId, systemPrompt + userPrompt, fullContent); + } + } catch (err) { + if (err instanceof DOMException && err.name === "AbortError") throw err; + + errorMessage = formatProviderError(err); + + console.error( + `[RoundTable] Provider error from ${participant.modelInfo.providerName}/${participant.modelInfo.modelId}:`, + err, + ); + + // If nothing was streamed, emit a synthetic token so the UI + // never renders an empty card. If something WAS streamed, + // append the error to what was already shown. + if (fullContent.length === 0) { + fullContent = `[Error from ${participant.modelInfo.providerName} / ${participant.modelInfo.modelId}: ${errorMessage}]`; + emit({ type: "token", participantId: participant.id, round, token: fullContent }); + } else { + const tail = `\n\n[Error from ${participant.modelInfo.providerName} / ${participant.modelInfo.modelId}: ${errorMessage}]`; + fullContent += tail; + emit({ type: "token", participantId: participant.id, round, token: tail }); + } + } + } + + // Errored responses have no meaningful self-reported confidence. + // Using 0 keeps them out of the consensus score (which filters + // `r.error`) and makes the UI render an explicit error badge. + const confidence = errorMessage ? 0 : extractConfidence(fullContent); + const durationMs = Date.now() - started; + + const response: RoundResponse = { + participantId: participant.id, + roundNumber: round, + content: fullContent, + confidence, + timestamp: Date.now(), + durationMs, + usage, + error: errorMessage, + }; + + emit({ + type: "participant-end", + participantId: participant.id, + round, + confidence, + fullContent, + usage, + durationMs, + error: errorMessage, + }); + + return response; +} + +// ── Judge synthesizer ────────────────────────────────────── + +async function runJudge( + judgeModelId: string, + finalResponses: RoundResponse[], + participants: Participant[], + userPrompt: string, + emit: (event: ConsensusEvent) => void, + signal?: AbortSignal, +): Promise { + const resolved = findResolvedModel(judgeModelId); if (!resolved) { - throw new Error(`Model not found: ${participant.modelInfo.id}`); + throw new Error(`Judge model not found: ${judgeModelId}`); } const provider = createOpenAI({ @@ -98,52 +372,97 @@ async function streamParticipant( apiKey: resolved.apiKey, }); - emit({ type: "participant-start", participantId: participant.id, round: 0 }); + emit({ type: "judge-start", modelId: resolved.modelId, providerName: resolved.providerName }); - let fullContent = ""; + const context = buildJudgeContext(finalResponses, participants); + const system = `${JUDGE_PERSONA.systemPrompt} + +The original prompt that was debated was: +""" +${userPrompt} +"""`; + + let content = ""; + let usage: TokenUsage | undefined; try { + let capturedError: unknown = null; + + // See streamParticipant for why we use `.chat()` here. const result = streamText({ - model: provider(resolved.modelId), - system: systemPrompt, - prompt: userPrompt, - maxOutputTokens: 1500, - temperature: 0.7, + model: provider.chat(resolved.modelId), + system, + prompt: context, + maxOutputTokens: MAX_OUTPUT_TOKENS, + temperature: 0.3, abortSignal: signal, - }); + onError: ({ error }: { error: unknown }) => { + capturedError = error; + }, + } as Parameters[0]); - for await (const chunk of (await result).textStream) { + const awaited = await result; + for await (const chunk of awaited.textStream) { if (signal?.aborted) throw new DOMException("Aborted", "AbortError"); - fullContent += chunk; - emit({ type: "token", participantId: participant.id, round: 0, token: chunk }); + content += chunk; + emit({ type: "judge-token", token: chunk }); + } + + if (capturedError) throw capturedError; + + const rawUsage = await extractUsage(awaited as { usage?: unknown }); + if (rawUsage) { + usage = { + inputTokens: rawUsage.inputTokens, + outputTokens: rawUsage.outputTokens, + totalTokens: rawUsage.inputTokens + rawUsage.outputTokens, + estimatedCostUSD: estimateCost( + resolved.modelId, + rawUsage.inputTokens, + rawUsage.outputTokens, + ), + }; + } else { + usage = estimateUsageFromText(resolved.modelId, system + context, content); } } catch (err) { if (err instanceof DOMException && err.name === "AbortError") throw err; - const errorMsg = err instanceof Error ? err.message : "Unknown error"; - fullContent = `[Error from ${participant.modelInfo.providerName}/${participant.modelInfo.modelId}: ${errorMsg}]`; - emit({ type: "token", participantId: participant.id, round: 0, token: fullContent }); + const msg = formatProviderError(err); + console.error( + `[RoundTable] Judge error from ${resolved.providerName}/${resolved.modelId}:`, + err, + ); + const tail = content.length === 0 ? `[Judge error: ${msg}]` : `\n\n[Judge error: ${msg}]`; + content += tail; + emit({ type: "judge-token", token: tail }); } - const confidence = extractConfidence(fullContent); - - return { - participantId: participant.id, - roundNumber: 0, - content: fullContent, - confidence, - timestamp: Date.now(), + const result: JudgeResult = { + modelId: resolved.modelId, + providerName: resolved.providerName, + content, + majorityPosition: extractJudgeSection(content, "Majority Position"), + minorityPositions: extractJudgeSection(content, "Minority Positions"), + unresolvedDisputes: extractJudgeSection(content, "Unresolved Disputes"), + usage, }; + emit({ type: "judge-end", result }); + return result; } -/** Run the full consensus process, emitting SSE events */ -export async function runConsensus( +// ── CVP Engine ───────────────────────────────────────────── + +async function runCVPConsensus( prompt: string, participants: Participant[], - totalRounds: number, + options: ConsensusOptions, emit: (event: ConsensusEvent) => void, signal?: AbortSignal, -): Promise { +): Promise { + const totalRounds = options.rounds; const allResponses: RoundResponse[] = []; + const roundScores: number[] = []; + let roundsCompleted = 0; for (let round = 1; round <= totalRounds; round++) { if (signal?.aborted) throw new DOMException("Aborted", "AbortError"); @@ -153,56 +472,178 @@ export async function runConsensus( const previousResponses = allResponses.filter((r) => r.roundNumber < round); - const roundResponses: RoundResponse[] = []; + // Determine speaking order for this round + const order = + options.randomizeOrder && round > 1 ? shuffle(participants) : participants.slice(); - for (const participant of participants) { - if (signal?.aborted) throw new DOMException("Aborted", "AbortError"); + const roundResponses: RoundResponse[] = []; - const systemPrompt = buildRoundSystemPrompt( - participant.persona.systemPrompt, - type, - round, - totalRounds, - previousResponses, - ); + if (round === 1 && options.blindFirstRound) { + // Parallel, no cross-visibility — each participant only sees an empty previous-context. + const promises = order.map((participant) => { + const systemPrompt = buildRoundSystemPrompt( + participant.persona.systemPrompt, + type, + round, + totalRounds, + [], + ); + return streamParticipant(participant, systemPrompt, prompt, round, emit, signal); + }); + const results = await Promise.all(promises); + roundResponses.push(...results); + } else { + // Sequential — later participants see earlier ones from this round + for (const participant of order) { + if (signal?.aborted) throw new DOMException("Aborted", "AbortError"); - const response = await streamParticipant( - participant, - systemPrompt, - prompt, - (event) => { - if ("round" in event) { - emit({ ...event, round } as ConsensusEvent); - } else { - emit(event); - } - }, - signal, - ); + const visibleContext = [ + ...previousResponses, + ...roundResponses, // what earlier participants said in THIS round + ]; - response.roundNumber = round; - roundResponses.push(response); + const systemPrompt = buildRoundSystemPrompt( + participant.persona.systemPrompt, + type, + round, + totalRounds, + visibleContext, + ); - emit({ - type: "participant-end", - participantId: participant.id, - round, - confidence: response.confidence, - fullContent: response.content, - }); + const response = await streamParticipant( + participant, + systemPrompt, + prompt, + round, + emit, + signal, + ); + roundResponses.push(response); + } } allResponses.push(...roundResponses); const consensusScore = calculateConsensusScore(roundResponses); + roundScores.push(consensusScore); emit({ type: "round-end", round, consensusScore }); + + const disagreements = detectDisagreements(round, roundResponses, participants); + if (disagreements.length > 0) { + emit({ type: "disagreements", round, disagreements }); + } + + roundsCompleted = round; + + // Convergence check — requires at least round 2 before we can look at a delta + if (options.earlyStop && round >= 2 && round < totalRounds) { + const delta = Math.abs(consensusScore - roundScores[round - 2]); + if (delta <= EARLY_STOP_DELTA_THRESHOLD) { + const reason = `Consensus score delta ${delta.toFixed(1)} between rounds ${round - 1} and ${round} is at or below the convergence threshold (${EARLY_STOP_DELTA_THRESHOLD}).`; + emit({ type: "early-stop", round, delta, reason }); + break; + } + } } - const lastRoundResponses = allResponses.filter((r) => r.roundNumber === totalRounds); + const lastRoundNumber = roundsCompleted; + const lastRoundResponses = allResponses.filter((r) => r.roundNumber === lastRoundNumber); const finalScore = calculateConsensusScore(lastRoundResponses); + if (options.judgeEnabled && options.judgeModelId) { + await runJudge(options.judgeModelId, lastRoundResponses, participants, prompt, emit, signal); + } + emit({ type: "consensus-complete", finalScore, - summary: `Consensus reached after ${totalRounds} rounds with ${participants.length} participants. Final consensus score: ${finalScore}%.`, + summary: `CVP completed ${roundsCompleted} round${roundsCompleted !== 1 ? "s" : ""} with ${participants.length} participants. Final consensus score: ${finalScore}%.`, + roundsCompleted, + }); + + return finalScore; +} + +// ── Blind Jury Engine ────────────────────────────────────── + +async function runBlindJuryConsensus( + prompt: string, + participants: Participant[], + options: ConsensusOptions, + emit: (event: ConsensusEvent) => void, + signal?: AbortSignal, +): Promise { + // One and only round — parallel, no cross-visibility. + emit({ + type: "round-start", + round: 1, + roundType: "initial-analysis", + label: "Blind Jury Deliberation", + }); + + const results = await Promise.all( + participants.map((p) => + streamParticipant( + p, + buildBlindJurorSystemPrompt(p.persona.systemPrompt), + prompt, + 1, + emit, + signal, + ), + ), + ); + + const consensusScore = calculateConsensusScore(results); + emit({ type: "round-end", round: 1, consensusScore }); + + const disagreements = detectDisagreements(1, results, participants); + if (disagreements.length > 0) { + emit({ type: "disagreements", round: 1, disagreements }); + } + + // Blind Jury always runs the judge if a model is available. + if (options.judgeEnabled && options.judgeModelId) { + await runJudge(options.judgeModelId, results, participants, prompt, emit, signal); + } + + emit({ + type: "consensus-complete", + finalScore: consensusScore, + summary: `Blind Jury reached a consensus score of ${consensusScore}% across ${participants.length} independent jurors.`, + roundsCompleted: 1, }); + + return consensusScore; +} + +// ── Public entrypoint ────────────────────────────────────── + +export async function runConsensus( + prompt: string, + participants: Participant[], + options: ConsensusOptions, + emit: (event: ConsensusEvent) => void, + signal?: AbortSignal, +): Promise { + if (options.engine === "blind-jury") { + await runBlindJuryConsensus(prompt, participants, options, emit, signal); + } else { + await runCVPConsensus(prompt, participants, options, emit, signal); + } } + +// ── Exports for tests ────────────────────────────────────── + +export const __testing = { + calculateConsensusScore, + detectDisagreements, + extractConfidence, + extractJudgeSection, + getRoundMeta, + shuffle, + buildRoundSystemPrompt, + buildBlindJurorSystemPrompt, + buildJudgeContext, + ZERO_USAGE, + addUsage, +}; diff --git a/lib/personas.ts b/lib/personas.ts index f18713d..7b774ae 100644 --- a/lib/personas.ts +++ b/lib/personas.ts @@ -1,9 +1,12 @@ // ───────────────────────────────────────────────────────────── -// Consensus Arena — Persona Definitions +// RoundTable — Persona Definitions // ───────────────────────────────────────────────────────────── // Add new personas by appending to the PERSONAS array below. // Each persona needs: id, name, emoji, color, description, systemPrompt. // The systemPrompt shapes how the AI responds during consensus rounds. +// +// JUDGE_PERSONA is separate: it is only used by the non-voting +// Judge synthesizer and never appears in the participant selector. import type { Persona } from "./types"; @@ -71,3 +74,35 @@ export const PERSONAS: Persona[] = [ export function getPersona(id: string): Persona { return PERSONAS.find((p) => p.id === id) ?? PERSONAS[0]; } + +/** + * The Judge persona — used by the non-voting synthesizer. + * Not exposed via the participant selector. + */ +export const JUDGE_PERSONA: Persona = { + id: "judge", + name: "Consensus Judge", + emoji: "🪶", + color: "#eab308", + description: "Non-voting synthesizer that summarises majority and minority positions", + systemPrompt: `You are the Consensus Judge. You do NOT participate in the debate and you do NOT vote. Your only job is to read the final-round responses from every participant and produce a faithful synthesis. + +Produce your output in exactly this shape, with those headings: + +## Majority Position +One paragraph describing the position held by the largest coherent group, with the participants who held it. + +## Minority Positions +One short paragraph per dissenting view. Always preserve conditional exceptions — do not collapse them into the majority. + +## Unresolved Disputes +Bullet list of specific disagreements that remained open at the end of the debate. If none, say "None". + +## Synthesis Confidence +A single integer 0-100 reflecting how confident you are that the above synthesis is faithful to what was actually said. End with a line in exactly this format: \`JUDGE_CONFIDENCE: [0-100]\`. + +Rules: +- Do not invent claims. Quote or paraphrase what participants actually said. +- Do not pick a winner. Your job is faithfulness, not victory. +- Do not collapse a minority view with a conditional exception into the majority.`, +}; diff --git a/lib/pricing.ts b/lib/pricing.ts new file mode 100644 index 0000000..d036cb0 --- /dev/null +++ b/lib/pricing.ts @@ -0,0 +1,118 @@ +// ───────────────────────────────────────────────────────────── +// RoundTable — Model Pricing Table (USD per 1M tokens) +// ───────────────────────────────────────────────────────────── +// Used only for the live cost meter. Figures are best-effort +// public list prices and are clearly surfaced as *estimates* in +// the UI. Unknown models fall back to zero — the meter simply +// reports what it can price. +// +// To add a model, append an entry below. Fuzzy matching picks +// the longest prefix match, so `claude-sonnet-4` covers every +// dated revision of that family. + +import type { TokenUsage } from "./types"; + +export interface ModelPricing { + /** USD per 1,000,000 input tokens */ + input: number; + /** USD per 1,000,000 output tokens */ + output: number; +} + +/** + * Keys are matched against modelId using a case-insensitive + * longest-prefix lookup. Order does not matter. + */ +export const PRICING_TABLE: Record = { + // OpenAI + "gpt-4o-mini": { input: 0.15, output: 0.6 }, + "gpt-4o": { input: 2.5, output: 10 }, + "gpt-4.1-mini": { input: 0.4, output: 1.6 }, + "gpt-4.1": { input: 2, output: 8 }, + "gpt-5-mini": { input: 0.25, output: 2 }, + "gpt-5": { input: 1.25, output: 10 }, + "o1-mini": { input: 3, output: 12 }, + o1: { input: 15, output: 60 }, + + // Anthropic + "claude-haiku": { input: 0.8, output: 4 }, + "claude-sonnet-3-5": { input: 3, output: 15 }, + "claude-sonnet-4": { input: 3, output: 15 }, + "claude-opus-4": { input: 15, output: 75 }, + + // xAI + "grok-3-mini": { input: 0.3, output: 0.5 }, + "grok-3": { input: 3, output: 15 }, + "grok-4": { input: 5, output: 15 }, + + // Google + "gemini-1.5-flash": { input: 0.075, output: 0.3 }, + "gemini-1.5-pro": { input: 1.25, output: 5 }, + "gemini-2.0-flash": { input: 0.1, output: 0.4 }, + "gemini-2.5-pro": { input: 1.25, output: 10 }, + + // Mistral + "mistral-small": { input: 0.2, output: 0.6 }, + "mistral-large": { input: 2, output: 6 }, + + // Groq-hosted open models + "llama-3.3-70b": { input: 0.59, output: 0.79 }, + "llama-3.1-8b": { input: 0.05, output: 0.08 }, +}; + +/** Zero pricing used when a model has no pricing entry. */ +export const ZERO_PRICING: ModelPricing = { input: 0, output: 0 }; + +/** + * Look up pricing for a model id using case-insensitive + * longest-prefix matching, e.g. `claude-sonnet-4-20250514` + * resolves to the `claude-sonnet-4` entry. + */ +export function getModelPricing(modelId: string): ModelPricing { + const normalized = modelId.toLowerCase(); + let best: { key: string; price: ModelPricing } | null = null; + for (const [key, price] of Object.entries(PRICING_TABLE)) { + if (!normalized.includes(key)) continue; + if (!best || key.length > best.key.length) best = { key, price }; + } + return best?.price ?? ZERO_PRICING; +} + +/** Estimate USD cost of a single call from raw token counts. */ +export function estimateCost(modelId: string, inputTokens: number, outputTokens: number): number { + const p = getModelPricing(modelId); + return (inputTokens * p.input + outputTokens * p.output) / 1_000_000; +} + +/** Sum token usages without mutating either argument. */ +export function addUsage(a: TokenUsage, b: TokenUsage): TokenUsage { + return { + inputTokens: a.inputTokens + b.inputTokens, + outputTokens: a.outputTokens + b.outputTokens, + totalTokens: a.totalTokens + b.totalTokens, + estimatedCostUSD: a.estimatedCostUSD + b.estimatedCostUSD, + }; +} + +export const ZERO_USAGE: TokenUsage = { + inputTokens: 0, + outputTokens: 0, + totalTokens: 0, + estimatedCostUSD: 0, +}; + +/** Heuristic fallback when the SDK does not report usage: ~4 chars per token. */ +export function estimateUsageFromText( + modelId: string, + inputText: string, + outputText: string, +): TokenUsage { + const inputTokens = Math.max(0, Math.round(inputText.length / 4)); + const outputTokens = Math.max(0, Math.round(outputText.length / 4)); + return { + inputTokens, + outputTokens, + totalTokens: inputTokens + outputTokens, + estimatedCostUSD: estimateCost(modelId, inputTokens, outputTokens), + }; +} diff --git a/lib/prompt-library.ts b/lib/prompt-library.ts new file mode 100644 index 0000000..d23c838 --- /dev/null +++ b/lib/prompt-library.ts @@ -0,0 +1,72 @@ +// ───────────────────────────────────────────────────────────── +// RoundTable — Prompt Library +// ───────────────────────────────────────────────────────────── +// A small curated set of demo prompts. Shown as chips under the +// textarea when it is empty so first-time visitors have a +// one-click entry point into the consensus flow. + +export interface PromptPreset { + id: string; + label: string; + category: "Strategy" | "Engineering" | "Ethics" | "Science"; + prompt: string; +} + +export const PROMPT_LIBRARY: PromptPreset[] = [ + { + id: "microservices-day-one", + label: "Microservices from day one?", + category: "Engineering", + prompt: + "Should an early-stage startup use a microservices architecture from day one, or begin with a modular monolith and decompose later? Consider team size, operational complexity, and product-market-fit risk.", + }, + { + id: "ai-coding-assistants", + label: "AI coding assistants: net positive?", + category: "Strategy", + prompt: + "Are AI coding assistants (Copilot, Cursor, Claude Code, etc.) a net productivity gain for experienced engineers, or do they introduce subtle quality and dependency risks that outweigh the speedup?", + }, + { + id: "remote-vs-office", + label: "Full remote vs hybrid office", + category: "Strategy", + prompt: + "For a 50-person engineering org building a consumer SaaS product, is full remote or a 3-day hybrid model more likely to produce durable high-quality output over a 3-year horizon?", + }, + { + id: "rust-vs-go", + label: "Rust vs Go for a new backend", + category: "Engineering", + prompt: + "A team of five backend engineers (3 Go, 2 Python) is starting a new latency-sensitive service. Should they pick Rust or Go? Weigh ecosystem maturity, hiring, performance ceiling, and onboarding cost.", + }, + { + id: "llm-dataset-licensing", + label: "LLM training on licensed data", + category: "Ethics", + prompt: + "Should commercial LLM providers be legally required to train exclusively on explicitly licensed data, even if it means losing access to most of the public web? Consider innovation, creator rights, and competitive dynamics.", + }, + { + id: "carbon-capture", + label: "Direct air capture viability", + category: "Science", + prompt: + "Given current cost curves and energy requirements, is direct-air carbon capture a credible climate solution by 2040, or is it a distraction from faster mitigation pathways? Evaluate the evidence.", + }, + { + id: "universal-basic-income", + label: "UBI under AI automation", + category: "Ethics", + prompt: + "If AI automates 30% of knowledge-work tasks by 2035, is a universal basic income the correct policy response, or would narrower interventions (retraining, wage insurance, job guarantees) be more effective?", + }, + { + id: "nuclear-renaissance", + label: "Should we bet on nuclear?", + category: "Science", + prompt: + "Should industrialised nations aggressively restart nuclear fission build-out (SMRs and conventional) as a primary pillar of decarbonisation, or continue prioritising wind, solar, and storage?", + }, +]; diff --git a/lib/session.ts b/lib/session.ts new file mode 100644 index 0000000..e2a7cac --- /dev/null +++ b/lib/session.ts @@ -0,0 +1,219 @@ +// ───────────────────────────────────────────────────────────── +// RoundTable — Session Export / Import / Share +// ───────────────────────────────────────────────────────────── +// Serialises a completed run to Markdown, JSON, or a URL-hash +// permalink. The permalink is reversible — loading a URL with +// `#rt=` rehydrates the store in read-only view mode. + +import type { SessionSnapshot } from "./types"; + +const HASH_KEY = "rt"; + +// ── Snapshot → Markdown ──────────────────────────────────── + +export function snapshotToMarkdown(snapshot: SessionSnapshot): string { + const date = new Date(snapshot.createdAt).toISOString(); + const engineName = snapshot.engine === "blind-jury" ? "Blind Jury" : "CVP"; + + const lines: string[] = []; + lines.push("# RoundTable Session"); + lines.push(""); + lines.push(`**Prompt**: ${snapshot.prompt}`); + lines.push(`**Engine**: ${engineName}`); + lines.push(`**Date**: ${date}`); + if (snapshot.finalScore !== null) { + lines.push(`**Final consensus score**: ${snapshot.finalScore}%`); + } + if (snapshot.tokenTotal && snapshot.tokenTotal.totalTokens > 0) { + lines.push( + `**Total cost**: $${snapshot.tokenTotal.estimatedCostUSD.toFixed(4)} (${snapshot.tokenTotal.totalTokens.toLocaleString()} tokens)`, + ); + } + lines.push(""); + + lines.push("## Participants"); + for (const p of snapshot.participants) { + lines.push(`- **${p.persona.name}** — ${p.modelInfo.providerName} / ${p.modelInfo.modelId}`); + } + lines.push(""); + + for (const round of snapshot.rounds) { + lines.push(`## Round ${round.number} — ${round.label}`); + lines.push(`_Consensus score: ${round.consensusScore}%_`); + lines.push(""); + for (const r of round.responses) { + const p = snapshot.participants.find((x) => x.id === r.participantId); + const heading = p + ? `${p.persona.name} (${p.modelInfo.providerName}/${p.modelInfo.modelId})` + : r.participantId; + lines.push(`### ${heading} — confidence ${r.confidence}%`); + lines.push(r.content.replace(/\nCONFIDENCE:\s*\d+\s*$/i, "").trim()); + lines.push(""); + } + } + + if (snapshot.disagreements.length > 0) { + lines.push("## Disagreements"); + for (const d of snapshot.disagreements) { + lines.push(`- Round ${d.round}: ${d.label} (severity ${d.severity})`); + } + lines.push(""); + } + + if (snapshot.judge) { + lines.push(`## Judge Synthesis — ${snapshot.judge.providerName} / ${snapshot.judge.modelId}`); + lines.push(snapshot.judge.content); + lines.push(""); + } + + return lines.join("\n"); +} + +// ── Snapshot → JSON ──────────────────────────────────────── + +export function snapshotToJSON(snapshot: SessionSnapshot): string { + return JSON.stringify(snapshot, null, 2); +} + +// ── Snapshot ↔ URL hash ──────────────────────────────────── + +/** + * Encode to a base64url payload. Uses `CompressionStream` when + * available to keep the URL short, falls back to plain base64 + * otherwise. + */ +export async function encodeSnapshotToHash(snapshot: SessionSnapshot): Promise { + const json = JSON.stringify(snapshot); + const bytes = new TextEncoder().encode(json); + + // Attempt compression + const gz = await maybeCompress(bytes); + const payload = gz ?? bytes; + const marker = gz ? "c" : "r"; + const base64 = bytesToBase64Url(payload); + return `${HASH_KEY}=${marker}${base64}`; +} + +/** Reverse of encodeSnapshotToHash. Returns null on any failure. */ +export async function decodeSnapshotFromHash(hash: string): Promise { + try { + const trimmed = hash.replace(/^#/, ""); + const params = new URLSearchParams(trimmed); + const value = params.get(HASH_KEY); + if (!value) return null; + const marker = value[0]; + const encoded = value.slice(1); + const bytes = base64UrlToBytes(encoded); + let raw: Uint8Array; + if (marker === "c") { + const decompressed = await maybeDecompress(bytes); + if (!decompressed) return null; + raw = decompressed; + } else { + raw = bytes; + } + const json = new TextDecoder().decode(raw); + const parsed = JSON.parse(json) as SessionSnapshot; + if (parsed.v !== 1 || !Array.isArray(parsed.rounds)) return null; + return parsed; + } catch { + return null; + } +} + +// ── Compression helpers ──────────────────────────────────── + +type CompressionStreamCtor = new (format: string) => { + readable: ReadableStream; + writable: WritableStream; +}; +type DecompressionStreamCtor = new (format: string) => { + readable: ReadableStream; + writable: WritableStream; +}; + +function getCompressionCtor(): CompressionStreamCtor | null { + const g = globalThis as unknown as { CompressionStream?: CompressionStreamCtor }; + return typeof g.CompressionStream === "function" ? g.CompressionStream : null; +} + +function getDecompressionCtor(): DecompressionStreamCtor | null { + const g = globalThis as unknown as { DecompressionStream?: DecompressionStreamCtor }; + return typeof g.DecompressionStream === "function" ? g.DecompressionStream : null; +} + +async function maybeCompress(bytes: Uint8Array): Promise { + const Ctor = getCompressionCtor(); + if (!Ctor) return null; + try { + const cs = new Ctor("deflate-raw"); + const writer = cs.writable.getWriter(); + writer.write(bytes); + writer.close(); + const out = await new Response(cs.readable).arrayBuffer(); + return new Uint8Array(out); + } catch { + return null; + } +} + +async function maybeDecompress(bytes: Uint8Array): Promise { + const Ctor = getDecompressionCtor(); + if (!Ctor) return null; + try { + const ds = new Ctor("deflate-raw"); + const writer = ds.writable.getWriter(); + writer.write(bytes); + writer.close(); + const out = await new Response(ds.readable).arrayBuffer(); + return new Uint8Array(out); + } catch { + return null; + } +} + +// ── base64url ────────────────────────────────────────────── + +function bytesToBase64Url(bytes: Uint8Array): string { + let binary = ""; + for (let i = 0; i < bytes.byteLength; i++) binary += String.fromCharCode(bytes[i]); + // `btoa` is available in both browser and Node 18+ + const base64 = + typeof btoa === "function" ? btoa(binary) : Buffer.from(binary, "binary").toString("base64"); + return base64.replace(/\+/g, "-").replace(/\//g, "_").replace(/=+$/, ""); +} + +function base64UrlToBytes(b64: string): Uint8Array { + const padded = b64.replace(/-/g, "+").replace(/_/g, "/") + "===".slice((b64.length + 3) % 4); + const binary = + typeof atob === "function" ? atob(padded) : Buffer.from(padded, "base64").toString("binary"); + const out = new Uint8Array(binary.length); + for (let i = 0; i < binary.length; i++) out[i] = binary.charCodeAt(i); + return out; +} + +// ── Browser file download ────────────────────────────────── + +export function downloadBlob(filename: string, data: string, mime: string): void { + if (typeof document === "undefined") return; + const blob = new Blob([data], { type: mime }); + const url = URL.createObjectURL(blob); + const a = document.createElement("a"); + a.href = url; + a.download = filename; + document.body.appendChild(a); + a.click(); + document.body.removeChild(a); + URL.revokeObjectURL(url); +} + +export function snapshotFilename(snapshot: SessionSnapshot, ext: string): string { + const stamp = new Date(snapshot.createdAt).toISOString().replace(/[:.]/g, "-").slice(0, 19); + const slug = + snapshot.prompt + .slice(0, 40) + .toLowerCase() + .replace(/[^a-z0-9]+/g, "-") + .replace(/^-+|-+$/g, "") || "session"; + return `roundtable-${slug}-${stamp}.${ext}`; +} diff --git a/lib/store.ts b/lib/store.ts index 5a63d18..f8ab08a 100644 --- a/lib/store.ts +++ b/lib/store.ts @@ -3,24 +3,63 @@ // ───────────────────────────────────────────────────────────── import { create } from "zustand"; -import type { ArenaState, ModelInfo, Persona, RoundType } from "./types"; +import type { + ArenaState, + ConsensusOptions, + Disagreement, + JudgeResult, + ModelInfo, + Persona, + RoundType, + SessionSnapshot, + TokenUsage, +} from "./types"; +import { addUsage, ZERO_USAGE } from "./pricing"; let participantCounter = 0; -export const useArenaStore = create((set) => ({ +export const DEFAULT_OPTIONS: ConsensusOptions = { + engine: "cvp", + rounds: 5, + randomizeOrder: true, + blindFirstRound: true, + earlyStop: true, + judgeEnabled: false, + judgeModelId: undefined, +}; + +const freshUsageState = () => ({ + tokenTotal: { ...ZERO_USAGE } as TokenUsage, + usageByParticipant: {} as Record, +}); + +export const useArenaStore = create((set, get) => ({ availableModels: [], modelsLoading: true, participants: [], - roundCount: 5, prompt: "", + options: { ...DEFAULT_OPTIONS }, + isRunning: false, currentRound: 0, rounds: [], activeStreams: {}, finalScore: null, finalSummary: null, - abortController: null, progress: 0, + roundsCompleted: 0, + + disagreements: [], + judge: null, + judgeStream: "", + judgeRunning: false, + earlyStopped: null, + ...freshUsageState(), + + sharedView: false, + abortController: null, + + // ── Configuration ────────────────────────────────────────── setAvailableModels: (models) => set({ availableModels: models }), setModelsLoading: (loading) => set({ modelsLoading: loading }), @@ -44,9 +83,20 @@ export const useArenaStore = create((set) => ({ participants: s.participants.map((p) => (p.id === id ? { ...p, modelInfo: model } : p)), })), - setRoundCount: (count) => set({ roundCount: Math.max(1, Math.min(10, count)) }), setPrompt: (prompt) => set({ prompt }), + setRoundCount: (count) => + set((s) => ({ + options: { ...s.options, rounds: Math.max(1, Math.min(10, count)) }, + })), + + setOption: (key, value) => + set((s) => ({ + options: { ...s.options, [key]: value }, + })), + + // ── Lifecycle ────────────────────────────────────────────── + startConsensus: () => { const controller = new AbortController(); set({ @@ -57,6 +107,14 @@ export const useArenaStore = create((set) => ({ finalScore: null, finalSummary: null, progress: 0, + roundsCompleted: 0, + disagreements: [], + judge: null, + judgeStream: "", + judgeRunning: false, + earlyStopped: null, + ...freshUsageState(), + sharedView: false, abortController: controller, }); return controller; @@ -65,7 +123,7 @@ export const useArenaStore = create((set) => ({ cancelConsensus: () => set((s) => { s.abortController?.abort(); - return { isRunning: false, abortController: null }; + return { isRunning: false, judgeRunning: false, abortController: null }; }), appendToken: (participantId, _round, token) => @@ -80,44 +138,101 @@ export const useArenaStore = create((set) => ({ set((s) => ({ currentRound: round, activeStreams: {}, - progress: (round - 1) / s.roundCount, + progress: (round - 1) / Math.max(1, s.options.rounds), rounds: [...s.rounds, { number: round, type, label, responses: [], consensusScore: 0 }], })), - completeParticipantRound: (participantId, roundNumber, confidence, fullContent) => - set((s) => ({ - activeStreams: { ...s.activeStreams, [participantId]: "" }, - rounds: s.rounds.map((r) => - r.number === roundNumber - ? { - ...r, - responses: [ - ...r.responses, - { - participantId, - roundNumber, - content: fullContent, - confidence, - timestamp: Date.now(), - }, - ], - } - : r, - ), - })), + completeParticipantRound: ( + participantId, + roundNumber, + confidence, + fullContent, + usage, + durationMs, + error, + ) => + set((s) => { + const nextUsageByParticipant = { ...s.usageByParticipant }; + let nextTotal = s.tokenTotal; + if (usage) { + const prev = nextUsageByParticipant[participantId] ?? ZERO_USAGE; + nextUsageByParticipant[participantId] = addUsage(prev, usage); + nextTotal = addUsage(nextTotal, usage); + } + return { + activeStreams: { ...s.activeStreams, [participantId]: "" }, + rounds: s.rounds.map((r) => + r.number === roundNumber + ? { + ...r, + responses: [ + ...r.responses, + { + participantId, + roundNumber, + content: fullContent, + confidence, + timestamp: Date.now(), + durationMs, + usage, + error, + }, + ], + } + : r, + ), + tokenTotal: nextTotal, + usageByParticipant: nextUsageByParticipant, + }; + }), endRound: (round, consensusScore) => set((s) => ({ rounds: s.rounds.map((r) => (r.number === round ? { ...r, consensusScore } : r)), - progress: round / s.roundCount, + progress: round / Math.max(1, s.options.rounds), + })), + + addDisagreements: (_round, items: Disagreement[]) => + set((s) => ({ + disagreements: [...s.disagreements, ...items], })), - completeConsensus: (finalScore, summary) => + setEarlyStopped: (info) => set({ earlyStopped: info }), + + startJudge: (modelId, providerName) => + set({ + judgeRunning: true, + judgeStream: "", + judge: { + modelId, + providerName, + content: "", + majorityPosition: "", + minorityPositions: "", + unresolvedDisputes: "", + }, + }), + + appendJudgeToken: (token) => set((s) => ({ judgeStream: s.judgeStream + token })), + + completeJudge: (result: JudgeResult) => + set((s) => { + const nextTotal = result.usage ? addUsage(s.tokenTotal, result.usage) : s.tokenTotal; + return { + judgeRunning: false, + judgeStream: "", + judge: result, + tokenTotal: nextTotal, + }; + }), + + completeConsensus: (finalScore, summary, roundsCompleted) => set({ isRunning: false, finalScore, finalSummary: summary, progress: 1, + roundsCompleted, abortController: null, }), @@ -132,7 +247,63 @@ export const useArenaStore = create((set) => ({ finalScore: null, finalSummary: null, progress: 0, + roundsCompleted: 0, + disagreements: [], + judge: null, + judgeStream: "", + judgeRunning: false, + earlyStopped: null, + ...freshUsageState(), + sharedView: false, abortController: null, }; }), + + // ── Snapshot / share ─────────────────────────────────────── + + loadSnapshot: (snapshot: SessionSnapshot) => { + // Abort anything running and replace visible state with the snapshot. + const s = get(); + s.abortController?.abort(); + set({ + prompt: snapshot.prompt, + participants: snapshot.participants, + options: snapshot.options, + rounds: snapshot.rounds, + finalScore: snapshot.finalScore, + finalSummary: snapshot.finalSummary, + judge: snapshot.judge, + judgeStream: "", + judgeRunning: false, + disagreements: snapshot.disagreements, + earlyStopped: null, + tokenTotal: snapshot.tokenTotal ?? { ...ZERO_USAGE }, + usageByParticipant: {}, + roundsCompleted: snapshot.rounds.length, + progress: 1, + activeStreams: {}, + currentRound: snapshot.rounds.length, + isRunning: false, + sharedView: true, + abortController: null, + }); + }, + + getSnapshot: (): SessionSnapshot => { + const s = get(); + return { + v: 1, + prompt: s.prompt, + engine: s.options.engine, + options: s.options, + participants: s.participants, + rounds: s.rounds, + finalScore: s.finalScore, + finalSummary: s.finalSummary, + judge: s.judge, + disagreements: s.disagreements, + tokenTotal: s.tokenTotal, + createdAt: Date.now(), + }; + }, })); diff --git a/lib/types.ts b/lib/types.ts index 39f8e11..c539180 100644 --- a/lib/types.ts +++ b/lib/types.ts @@ -46,6 +46,15 @@ export interface Participant { persona: Persona; } +/** Token usage for a single AI call */ +export interface TokenUsage { + inputTokens: number; + outputTokens: number; + totalTokens: number; + /** Estimated cost in USD, based on the pricing table in lib/pricing.ts */ + estimatedCostUSD: number; +} + /** A single round's response from one AI */ export interface RoundResponse { participantId: string; @@ -53,6 +62,10 @@ export interface RoundResponse { content: string; confidence: number; // 0-100 timestamp: number; + durationMs?: number; + usage?: TokenUsage; + /** If the provider call failed, a short human-readable error (e.g. `Not Found (HTTP 404)`). */ + error?: string; } /** Consensus round metadata */ @@ -70,6 +83,52 @@ export type RoundType = | "evidence-assessment" | "synthesis"; +/** Which engine to run */ +export type EngineType = "cvp" | "blind-jury"; + +/** A detected disagreement between participants in a round */ +export interface Disagreement { + id: string; // stable: `r--` + round: number; + participantAId: string; + participantBId: string; + /** Confidence delta that flagged the divergence (0-100) */ + severity: number; + /** Short label summarising the nature of the split */ + label: string; +} + +/** Judge synthesis output (non-voting final summariser) */ +export interface JudgeResult { + modelId: string; + providerName: string; + content: string; + majorityPosition: string; + minorityPositions: string; + unresolvedDisputes: string; + usage?: TokenUsage; +} + +/** + * User-configurable options for a consensus run. + * Every field is optional on the wire — defaults are applied server-side. + */ +export interface ConsensusOptions { + engine: EngineType; + /** Only used when engine === "cvp" */ + rounds: number; + /** Shuffle participant order each round (CVP only) */ + randomizeOrder: boolean; + /** Run Round 1 in parallel with no cross-visibility (CVP only) */ + blindFirstRound: boolean; + /** Stop early if the consensus delta between rounds falls below threshold (CVP only) */ + earlyStop: boolean; + /** Run a non-voting judge synthesizer at the end of the run */ + judgeEnabled: boolean; + /** Composite model id (provider:model) to use for the judge */ + judgeModelId?: string; +} + /** SSE event types streamed from /api/consensus */ export type ConsensusEvent = | { type: "round-start"; round: number; roundType: RoundType; label: string } @@ -81,16 +140,50 @@ export type ConsensusEvent = round: number; confidence: number; fullContent: string; + usage?: TokenUsage; + durationMs: number; + error?: string; } | { type: "round-end"; round: number; consensusScore: number } - | { type: "consensus-complete"; finalScore: number; summary: string } + | { type: "disagreements"; round: number; disagreements: Disagreement[] } + | { + type: "early-stop"; + round: number; + delta: number; + reason: string; + } + | { type: "judge-start"; modelId: string; providerName: string } + | { type: "judge-token"; token: string } + | { type: "judge-end"; result: JudgeResult } + | { + type: "consensus-complete"; + finalScore: number; + summary: string; + roundsCompleted: number; + } | { type: "error"; message: string }; /** Request body for /api/consensus */ export interface ConsensusRequest { prompt: string; participants: Participant[]; - rounds: number; + options: ConsensusOptions; +} + +/** A frozen snapshot of a completed run — used for export + share links */ +export interface SessionSnapshot { + v: 1; + prompt: string; + engine: EngineType; + options: ConsensusOptions; + participants: Participant[]; + rounds: ConsensusRound[]; + finalScore: number | null; + finalSummary: string | null; + judge: JudgeResult | null; + disagreements: Disagreement[]; + tokenTotal: TokenUsage | null; + createdAt: number; } /** Global app state managed by Zustand */ @@ -101,8 +194,8 @@ export interface ArenaState { // Configuration participants: Participant[]; - roundCount: number; prompt: string; + options: ConsensusOptions; // Consensus execution state isRunning: boolean; @@ -112,19 +205,35 @@ export interface ArenaState { finalScore: number | null; finalSummary: string | null; progress: number; // 0-1 + roundsCompleted: number; + + // New — Judge, disagreements, cost meter + disagreements: Disagreement[]; + judge: JudgeResult | null; + judgeStream: string; + judgeRunning: boolean; + earlyStopped: { round: number; delta: number; reason: string } | null; + tokenTotal: TokenUsage; + usageByParticipant: Record; + + // Shared-session replay flag + sharedView: boolean; // Cancellation abortController: AbortController | null; - // Actions + // Actions — configuration setAvailableModels: (models: ModelInfo[]) => void; setModelsLoading: (loading: boolean) => void; addParticipant: (model: ModelInfo, persona: Persona) => void; removeParticipant: (id: string) => void; updateParticipantPersona: (id: string, persona: Persona) => void; updateParticipantModel: (id: string, model: ModelInfo) => void; - setRoundCount: (count: number) => void; setPrompt: (prompt: string) => void; + setRoundCount: (count: number) => void; + setOption: (key: K, value: ConsensusOptions[K]) => void; + + // Actions — lifecycle startConsensus: () => AbortController; cancelConsensus: () => void; appendToken: (participantId: string, round: number, token: string) => void; @@ -133,9 +242,21 @@ export interface ArenaState { round: number, confidence: number, fullContent: string, + usage?: TokenUsage, + durationMs?: number, + error?: string, ) => void; startRound: (round: number, type: RoundType, label: string) => void; endRound: (round: number, consensusScore: number) => void; - completeConsensus: (finalScore: number, summary: string) => void; + addDisagreements: (round: number, items: Disagreement[]) => void; + setEarlyStopped: (info: { round: number; delta: number; reason: string }) => void; + startJudge: (modelId: string, providerName: string) => void; + appendJudgeToken: (token: string) => void; + completeJudge: (result: JudgeResult) => void; + completeConsensus: (finalScore: number, summary: string, roundsCompleted: number) => void; reset: () => void; + + // Snapshot / replay + loadSnapshot: (snapshot: SessionSnapshot) => void; + getSnapshot: () => SessionSnapshot; } diff --git a/screenshots/newscreenshot.png b/screenshots/newscreenshot.png new file mode 100644 index 0000000..66ffd86 Binary files /dev/null and b/screenshots/newscreenshot.png differ diff --git a/screenshots/screenshot1.png b/screenshots/screenshot1.png deleted file mode 100644 index d901760..0000000 Binary files a/screenshots/screenshot1.png and /dev/null differ diff --git a/screenshots/screenshot2.png b/screenshots/screenshot2.png deleted file mode 100644 index 8d68385..0000000 Binary files a/screenshots/screenshot2.png and /dev/null differ diff --git a/tests/api-consensus.test.ts b/tests/api-consensus.test.ts index c63f4f5..e65c35e 100644 --- a/tests/api-consensus.test.ts +++ b/tests/api-consensus.test.ts @@ -1,10 +1,10 @@ import { describe, it, expect, vi } from "vitest"; -// Mock the consensus engine (accepts signal as 5th arg) +// Mock the consensus engine (new signature — options bundle) vi.mock("@/lib/consensus-engine", () => ({ - runConsensus: vi.fn(async (_prompt, _participants, _rounds, emit, _signal) => { + runConsensus: vi.fn(async (_prompt, _participants, _options, emit, _signal) => { emit({ type: "round-start", round: 1, roundType: "initial-analysis", label: "Analysis" }); - emit({ type: "consensus-complete", finalScore: 85, summary: "Done" }); + emit({ type: "consensus-complete", finalScore: 85, summary: "Done", roundsCompleted: 1 }); }), })); @@ -22,16 +22,14 @@ vi.mock("@/lib/personas", () => ({ // Mock providers (used by the route to validate models) vi.mock("@/lib/providers", () => ({ - findResolvedModel: (id: string) => - id - ? { providerId: "t", providerName: "T", modelId: "m", baseUrl: "http://x", apiKey: "k" } - : undefined, + findResolvedModel: (id: string) => { + if (!id) return undefined; + if (id === "unknown:model") return undefined; + return { providerId: "t", providerName: "T", modelId: "m", baseUrl: "http://x", apiKey: "k" }; + }, })); -// We need to reset rate limiter state between tests. -// The route module stores request counts in a module-level Map. -// Re-importing would create a fresh module, but vi.mock makes that tricky. -// Instead, we'll use unique IPs per test by varying x-forwarded-for. +// Use unique IPs per test to avoid rate limiting between tests let testIpCounter = 0; function makeRequest(body: unknown): Request { testIpCounter++; @@ -98,7 +96,7 @@ describe("POST /api/consensus", () => { expect(response.status).toBe(200); }); - it("returns SSE stream for valid request", async () => { + it("returns SSE stream for valid legacy `rounds` body", async () => { const response = await POST( makeRequest({ prompt: "test topic", @@ -123,4 +121,159 @@ describe("POST /api/consensus", () => { expect(output).toContain("round-start"); expect(output).toContain("consensus-complete"); }); + + it("accepts an options bundle", async () => { + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + options: { + engine: "cvp", + rounds: 3, + randomizeOrder: true, + blindFirstRound: true, + earlyStop: true, + judgeEnabled: false, + }, + }), + ); + expect(response.status).toBe(200); + }); + + it("accepts the blind-jury engine", async () => { + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + options: { + engine: "blind-jury", + rounds: 1, + randomizeOrder: false, + blindFirstRound: false, + earlyStop: false, + judgeEnabled: false, + }, + }), + ); + expect(response.status).toBe(200); + }); + + it("rejects judgeEnabled with no judge model", async () => { + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + options: { + engine: "cvp", + rounds: 2, + randomizeOrder: false, + blindFirstRound: false, + earlyStop: false, + judgeEnabled: true, + }, + }), + ); + expect(response.status).toBe(400); + const body = await response.json(); + expect(body.error).toContain("judgeModelId"); + }); + + it("accepts judgeEnabled with a judge model", async () => { + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + options: { + engine: "cvp", + rounds: 2, + randomizeOrder: false, + blindFirstRound: false, + earlyStop: false, + judgeEnabled: true, + judgeModelId: "t:m", + }, + }), + ); + expect(response.status).toBe(200); + }); + + it("rejects when a participant's model cannot be resolved", async () => { + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "unknown:model" }, persona: { id: "test" } }], + rounds: 1, + }), + ); + expect(response.status).toBe(400); + const body = await response.json(); + expect(body.error).toContain("Model not available"); + }); + + it("rejects when the judge model cannot be resolved", async () => { + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + options: { + engine: "cvp", + rounds: 1, + randomizeOrder: false, + blindFirstRound: false, + earlyStop: false, + judgeEnabled: true, + judgeModelId: "unknown:model", + }, + }), + ); + expect(response.status).toBe(400); + const body = await response.json(); + expect(body.error).toContain("Judge model not available"); + }); + + it("surfaces engine errors via an `error` SSE event", async () => { + const { runConsensus } = await import("@/lib/consensus-engine"); + (runConsensus as ReturnType).mockImplementationOnce(async () => { + throw new Error("explode"); + }); + + const response = await POST( + makeRequest({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + rounds: 1, + }), + ); + + const reader = response.body!.getReader(); + const decoder = new TextDecoder(); + let output = ""; + while (true) { + const { done, value } = await reader.read(); + if (done) break; + output += decoder.decode(value); + } + expect(output).toContain("error"); + expect(output).toContain("explode"); + }); + + it("returns 429 when rate limit is exceeded for a single IP", async () => { + const fixedIp = "rate-limit-test-ip"; + const makeFixed = () => + new Request("http://localhost/api/consensus", { + method: "POST", + headers: { "Content-Type": "application/json", "x-forwarded-for": fixedIp }, + body: JSON.stringify({ + prompt: "test", + participants: [{ id: "p-1", modelInfo: { id: "t:m" }, persona: { id: "test" } }], + rounds: 1, + }), + }); + + let last: Response | null = null; + for (let i = 0; i < 6; i++) { + last = await POST(makeFixed()); + } + expect(last?.status).toBe(429); + }); }); diff --git a/tests/components-extended.test.tsx b/tests/components-extended.test.tsx index fda7a06..3adf953 100644 --- a/tests/components-extended.test.tsx +++ b/tests/components-extended.test.tsx @@ -114,6 +114,46 @@ describe("AISelector — interactions", () => { }); }); +describe("ResultPanel — error card", () => { + beforeEach(() => { + useArenaStore.getState().reset(); + }); + + it("renders an error card with the provider error when a response has error set", async () => { + const { default: ResultPanel } = await import("@/components/ResultPanel"); + + useArenaStore.setState({ + participants: [{ id: "p-1", modelInfo: model1, persona: PERSONAS[0] }], + rounds: [ + { + number: 1, + type: "initial-analysis" as const, + label: "Analysis", + responses: [ + { + participantId: "p-1", + roundNumber: 1, + content: "[Error from Prov / m1: Not Found — HTTP 404]", + confidence: 0, + timestamp: Date.now(), + error: "Not Found — HTTP 404", + }, + ], + consensusScore: 0, + }, + ], + currentRound: 2, + }); + + render(); + expect(screen.getByText("Provider error")).toBeInTheDocument(); + expect(screen.getByText("Not Found — HTTP 404")).toBeInTheDocument(); + expect(screen.getByText("ERROR")).toBeInTheDocument(); + // Must NOT render a "..." placeholder + expect(screen.queryByText("...")).not.toBeInTheDocument(); + }); +}); + describe("ResultPanel — copy button", () => { beforeEach(() => { useArenaStore.getState().reset(); diff --git a/tests/components.test.tsx b/tests/components.test.tsx index 5b0fcc3..e8d7efe 100644 --- a/tests/components.test.tsx +++ b/tests/components.test.tsx @@ -33,7 +33,7 @@ describe("ResultPanel", () => { useArenaStore.getState().reset(); useArenaStore.setState({ participants: [], - roundCount: 3, + options: { ...useArenaStore.getState().options, rounds: 3 }, isRunning: false, rounds: [], finalScore: null, @@ -106,7 +106,7 @@ describe("ResultPanel", () => { progress: 0.6, isRunning: true, currentRound: 2, - roundCount: 3, + options: { ...useArenaStore.getState().options, rounds: 3 }, }); render(); diff --git a/tests/consensus-engine.test.ts b/tests/consensus-engine.test.ts index 65f52a6..8d3d4b2 100644 --- a/tests/consensus-engine.test.ts +++ b/tests/consensus-engine.test.ts @@ -1,22 +1,46 @@ import { describe, it, expect, vi, beforeEach } from "vitest"; -import type { ConsensusEvent, Participant } from "@/lib/types"; +import type { ConsensusEvent, ConsensusOptions, Participant } from "@/lib/types"; import { PERSONAS } from "@/lib/personas"; +import { DEFAULT_OPTIONS } from "@/lib/store"; -// Mock the AI SDK and providers before importing the engine +// Mock the AI SDK and providers before importing the engine. +// The provider stub is a function (for legacy `provider(modelId)` callers) +// that also exposes `.chat` / `.responses` methods the engine uses. vi.mock("@ai-sdk/openai", () => ({ - createOpenAI: () => (modelId: string) => ({ modelId }), + createOpenAI: () => { + const make = (modelId: string) => ({ modelId }); + const provider = make as ((modelId: string) => { modelId: string }) & { + chat: (modelId: string) => { modelId: string }; + responses: (modelId: string) => { modelId: string }; + }; + provider.chat = make; + provider.responses = make; + return provider; + }, })); -// streamText returns a Promise<{ textStream: AsyncIterable }> -// Now also accepts abortSignal in the options +// streamText returns an object with textStream and usage. +// A counter lets individual tests vary the confidence yielded. +let confidenceSequence: number[] = []; +let confidenceIndex = 0; + +function nextConfidence(): number { + if (confidenceSequence.length === 0) return 78; + const v = confidenceSequence[confidenceIndex % confidenceSequence.length]; + confidenceIndex++; + return v; +} + vi.mock("ai", () => ({ streamText: vi.fn((_opts?: { abortSignal?: AbortSignal }) => { - return Promise.resolve({ + const confidence = nextConfidence(); + return { textStream: (async function* () { - yield "Analysis complete."; - yield "\nCONFIDENCE: 78"; + yield "Analysis complete. "; + yield `\nCONFIDENCE: ${confidence}`; })(), - }); + usage: Promise.resolve({ inputTokens: 100, outputTokens: 50 }), + }; }), })); @@ -34,9 +58,14 @@ vi.mock("@/lib/providers", () => ({ })); // Import after mocks are set up -const { runConsensus } = await import("@/lib/consensus-engine"); +const { runConsensus, shuffle, detectDisagreements, __testing } = + await import("@/lib/consensus-engine"); + +function opts(overrides: Partial = {}): ConsensusOptions { + return { ...DEFAULT_OPTIONS, ...overrides }; +} -const makeParticipant = (id: string, modelId = "test:test-model"): Participant => ({ +const makeParticipant = (id: string, modelId = "test:test-model", personaIdx = 0): Participant => ({ id, modelInfo: { id: modelId, @@ -44,39 +73,44 @@ const makeParticipant = (id: string, modelId = "test:test-model"): Participant = providerName: "Test", modelId: "test-model", }, - persona: PERSONAS[0], + persona: PERSONAS[personaIdx % PERSONAS.length], }); describe("consensus-engine", () => { beforeEach(() => { vi.clearAllMocks(); + confidenceSequence = []; + confidenceIndex = 0; }); it("emits round-start, participant events, round-end, and consensus-complete", async () => { const events: ConsensusEvent[] = []; const emit = (e: ConsensusEvent) => events.push(e); - await runConsensus("Test topic", [makeParticipant("p-1"), makeParticipant("p-2")], 2, emit); + await runConsensus( + "Test topic", + [makeParticipant("p-1"), makeParticipant("p-2", "test:test-model", 1)], + opts({ rounds: 2, randomizeOrder: false, blindFirstRound: false, earlyStop: false }), + emit, + ); const types = events.map((e) => e.type); - // Should have round-start for each round expect(types.filter((t) => t === "round-start")).toHaveLength(2); - - // Should have participant-start and participant-end for each participant per round - expect(types.filter((t) => t === "participant-start")).toHaveLength(4); // 2 participants * 2 rounds + expect(types.filter((t) => t === "participant-start")).toHaveLength(4); expect(types.filter((t) => t === "participant-end")).toHaveLength(4); - - // Should have round-end for each round expect(types.filter((t) => t === "round-end")).toHaveLength(2); - - // Should end with consensus-complete expect(types[types.length - 1]).toBe("consensus-complete"); }); it("emits tokens during streaming", async () => { const events: ConsensusEvent[] = []; - await runConsensus("Test", [makeParticipant("p-1")], 1, (e) => events.push(e)); + await runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); const tokens = events.filter((e) => e.type === "token"); expect(tokens.length).toBeGreaterThan(0); @@ -84,19 +118,29 @@ describe("consensus-engine", () => { it("extracts confidence from response", async () => { const events: ConsensusEvent[] = []; - await runConsensus("Test", [makeParticipant("p-1")], 1, (e) => events.push(e)); + await runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); const end = events.find((e) => e.type === "participant-end"); expect(end).toBeDefined(); if (end?.type === "participant-end") { expect(end.confidence).toBe(78); + expect(end.usage).toBeDefined(); + expect(end.usage!.totalTokens).toBe(150); } }); it("calculates consensus score in round-end events", async () => { const events: ConsensusEvent[] = []; - await runConsensus("Test", [makeParticipant("p-1"), makeParticipant("p-2")], 1, (e) => - events.push(e), + await runConsensus( + "Test", + [makeParticipant("p-1"), makeParticipant("p-2", "test:test-model", 1)], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), ); const roundEnd = events.find((e) => e.type === "round-end"); @@ -109,19 +153,30 @@ describe("consensus-engine", () => { it("emits final consensus score and summary", async () => { const events: ConsensusEvent[] = []; - await runConsensus("Test", [makeParticipant("p-1")], 1, (e) => events.push(e)); + await runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); const complete = events.find((e) => e.type === "consensus-complete"); expect(complete).toBeDefined(); if (complete?.type === "consensus-complete") { expect(complete.finalScore).toBeGreaterThanOrEqual(0); - expect(complete.summary).toContain("Consensus reached"); + expect(complete.summary).toContain("CVP"); + expect(complete.roundsCompleted).toBe(1); } }); it("assigns correct round types per CVP spec", async () => { const events: ConsensusEvent[] = []; - await runConsensus("Test", [makeParticipant("p-1")], 5, (e) => events.push(e)); + await runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ rounds: 5, blindFirstRound: false, randomizeOrder: false, earlyStop: false }), + (e) => events.push(e), + ); const roundStarts = events.filter((e) => e.type === "round-start") as Array< Extract @@ -135,18 +190,484 @@ describe("consensus-engine", () => { expect(roundStarts[4].label).toContain("Final Synthesis"); }); - it("handles model not found error gracefully", async () => { + it("handles thrown provider errors gracefully via catch", async () => { const { streamText } = await import("ai"); - (streamText as ReturnType).mockRejectedValueOnce( - new Error("Model not found: missing:model"), + (streamText as ReturnType).mockImplementationOnce(() => ({ + textStream: (async function* () { + throw new Error("Upstream fetch failed"); + })(), + usage: Promise.resolve({ inputTokens: 0, outputTokens: 0 }), + })); + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-err")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); + + const end = events.find((e) => e.type === "participant-end"); + expect(end).toBeDefined(); + if (end?.type === "participant-end") { + expect(end.error).toContain("Upstream fetch failed"); + expect(end.fullContent).toContain("Error from"); + expect(end.confidence).toBe(0); + } + }); + + it("captures onError-reported provider failures without throwing from textStream", async () => { + const { streamText } = await import("ai"); + // Simulate the Vercel AI SDK v6 pattern: textStream ends cleanly + // but onError was called with an AI_APICallError-shaped object. + (streamText as ReturnType).mockImplementationOnce( + (options: { onError?: (e: { error: unknown }) => void }) => { + const apiError = Object.assign(new Error("Not Found"), { + name: "AI_APICallError", + statusCode: 404, + url: "https://api.anthropic.com/v1/responses", + }); + options.onError?.({ error: apiError }); + return { + textStream: (async function* () { + // Silent stream end — simulates v6 behavior where errors + // are surfaced via onError rather than as iterator throws. + })(), + usage: Promise.resolve({ inputTokens: 0, outputTokens: 0 }), + }; + }, ); const events: ConsensusEvent[] = []; - // This will trigger the error path in streamParticipant - await runConsensus("Test", [makeParticipant("p-err")], 1, (e) => events.push(e)); + await runConsensus( + "Test", + [makeParticipant("p-err")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); - // Should still complete (error is caught per-participant) - const tokens = events.filter((e) => e.type === "token"); - expect(tokens.length).toBeGreaterThanOrEqual(0); + const end = events.find((e) => e.type === "participant-end"); + expect(end).toBeDefined(); + if (end?.type === "participant-end") { + expect(end.error).toContain("Not Found"); + expect(end.error).toContain("HTTP 404"); + expect(end.fullContent).toContain("Error from"); + expect(end.confidence).toBe(0); + } + }); + + it("emits a synthetic error token when the model cannot be resolved", async () => { + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-ghost", "missing:model")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); + + const end = events.find((e) => e.type === "participant-end"); + expect(end).toBeDefined(); + if (end?.type === "participant-end") { + expect(end.error).toContain("Model not available"); + expect(end.fullContent).toContain("Error from"); + } + }); + + it("excludes errored responses from the consensus score", async () => { + const { streamText } = await import("ai"); + (streamText as ReturnType) + .mockImplementationOnce((options: { onError?: (e: { error: unknown }) => void }) => { + options.onError?.({ + error: Object.assign(new Error("Not Found"), { statusCode: 404 }), + }); + return { + textStream: (async function* () {})(), + usage: Promise.resolve({ inputTokens: 0, outputTokens: 0 }), + }; + }) + .mockImplementationOnce(() => ({ + textStream: (async function* () { + yield "Good answer. \nCONFIDENCE: 90"; + })(), + usage: Promise.resolve({ inputTokens: 10, outputTokens: 5 }), + })); + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-bad"), makeParticipant("p-good", "test:test-model", 1)], + opts({ rounds: 1, blindFirstRound: false, randomizeOrder: false }), + (e) => events.push(e), + ); + + const roundEnd = events.find((e) => e.type === "round-end"); + if (roundEnd?.type === "round-end") { + // Only the good response (confidence 90) contributes; + // score = 90 - 0.5 * 0 = 90 + expect(roundEnd.consensusScore).toBe(90); + } + }); + + it("detectDisagreements skips pairs where either side errored", () => { + const participants = [ + { + id: "a", + modelInfo: { + id: "test:test-model", + providerId: "test", + providerName: "Test", + modelId: "test-model", + }, + persona: PERSONAS[0], + }, + { + id: "b", + modelInfo: { + id: "test:test-model", + providerId: "test", + providerName: "Test", + modelId: "test-model", + }, + persona: PERSONAS[1], + }, + ]; + const out = detectDisagreements( + 1, + [ + { + participantId: "a", + roundNumber: 1, + content: "", + confidence: 0, + timestamp: 0, + error: "HTTP 404", + }, + { participantId: "b", roundNumber: 1, content: "", confidence: 90, timestamp: 0 }, + ], + participants, + ); + expect(out).toHaveLength(0); + }); + + // ── New feature tests ──────────────────────────────────── + + it("runs Round 1 in parallel when blindFirstRound is enabled", async () => { + const { streamText } = await import("ai"); + (streamText as ReturnType).mockClear(); + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [ + makeParticipant("p-1"), + makeParticipant("p-2", "test:test-model", 1), + makeParticipant("p-3", "test:test-model", 2), + ], + opts({ rounds: 1, blindFirstRound: true, randomizeOrder: false }), + (e) => events.push(e), + ); + + // With blindFirstRound, no participant sees prior responses in round 1. + // Check that the system prompts passed to streamText contain no + // "--- PREVIOUS ROUND RESPONSES ---" marker. + const calls = (streamText as ReturnType).mock.calls; + expect(calls.length).toBe(3); + for (const call of calls) { + const systemPrompt = call[0].system as string; + expect(systemPrompt).not.toContain("PREVIOUS ROUND RESPONSES"); + } + }); + + it("stops early when consensus converges", async () => { + // Force identical confidences so delta = 0 between rounds 1 and 2 + confidenceSequence = [80, 80, 80, 80, 80, 80]; + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-1"), makeParticipant("p-2", "test:test-model", 1)], + opts({ + rounds: 5, + blindFirstRound: false, + randomizeOrder: false, + earlyStop: true, + }), + (e) => events.push(e), + ); + + const earlyStop = events.find((e) => e.type === "early-stop"); + expect(earlyStop).toBeDefined(); + + const complete = events.find((e) => e.type === "consensus-complete"); + if (complete?.type === "consensus-complete") { + expect(complete.roundsCompleted).toBeLessThan(5); + } + }); + + it("emits disagreements when confidence diverges by 20+ points", async () => { + // Alternating 90 and 60 → delta = 30 → disagreement + confidenceSequence = [90, 60]; + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-1"), makeParticipant("p-2", "test:test-model", 1)], + opts({ rounds: 1, blindFirstRound: false, randomizeOrder: false }), + (e) => events.push(e), + ); + + const disagreement = events.find((e) => e.type === "disagreements"); + expect(disagreement).toBeDefined(); + if (disagreement?.type === "disagreements") { + expect(disagreement.disagreements.length).toBeGreaterThan(0); + expect(disagreement.disagreements[0].severity).toBe(30); + } + }); + + it("blind-jury engine runs a single parallel round with no cross-visibility", async () => { + const { streamText } = await import("ai"); + (streamText as ReturnType).mockClear(); + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [ + makeParticipant("p-1"), + makeParticipant("p-2", "test:test-model", 1), + makeParticipant("p-3", "test:test-model", 2), + ], + opts({ engine: "blind-jury" }), + (e) => events.push(e), + ); + + const roundStarts = events.filter((e) => e.type === "round-start"); + expect(roundStarts).toHaveLength(1); + expect((roundStarts[0] as { label: string }).label).toContain("Blind Jury"); + + const calls = (streamText as ReturnType).mock.calls; + expect(calls.length).toBe(3); + for (const call of calls) { + const systemPrompt = call[0].system as string; + expect(systemPrompt).toContain("BLIND JURY"); + } + + const complete = events.find((e) => e.type === "consensus-complete"); + if (complete?.type === "consensus-complete") { + expect(complete.roundsCompleted).toBe(1); + } + }); + + it("runs the judge synthesizer when judgeEnabled and judgeModelId are set", async () => { + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-1"), makeParticipant("p-2", "test:test-model", 1)], + opts({ + rounds: 1, + blindFirstRound: false, + randomizeOrder: false, + judgeEnabled: true, + judgeModelId: "test:test-model", + }), + (e) => events.push(e), + ); + + expect(events.find((e) => e.type === "judge-start")).toBeDefined(); + expect(events.find((e) => e.type === "judge-end")).toBeDefined(); + const tokens = events.filter((e) => e.type === "judge-token"); + expect(tokens.length).toBeGreaterThan(0); + }); + + it("judge error path still emits judge-end with error content", async () => { + const { streamText } = await import("ai"); + // Participant call succeeds, judge call throws + let calls = 0; + (streamText as ReturnType).mockImplementation(() => { + calls++; + if (calls === 1) { + return { + textStream: (async function* () { + yield "x"; + yield "\nCONFIDENCE: 60"; + })(), + usage: Promise.resolve({ inputTokens: 5, outputTokens: 5 }), + }; + } + throw new Error("judge upstream failure"); + }); + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ + rounds: 1, + blindFirstRound: false, + randomizeOrder: false, + judgeEnabled: true, + judgeModelId: "test:test-model", + }), + (e) => events.push(e), + ); + const end = events.find((e) => e.type === "judge-end"); + expect(end).toBeDefined(); + if (end?.type === "judge-end") { + expect(end.result.content).toContain("Judge error"); + } + }); + + it("falls back to heuristic usage when the SDK reports no usage", async () => { + const { streamText } = await import("ai"); + (streamText as ReturnType).mockImplementationOnce(() => ({ + textStream: (async function* () { + yield "Fallback content. "; + yield "\nCONFIDENCE: 65"; + })(), + // No `usage` field at all + })); + + const events: ConsensusEvent[] = []; + await runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ rounds: 1, blindFirstRound: false }), + (e) => events.push(e), + ); + const end = events.find((e) => e.type === "participant-end"); + if (end?.type === "participant-end") { + expect(end.usage).toBeDefined(); + expect(end.usage!.totalTokens).toBeGreaterThan(0); + } + }); + + it("propagates an abort signal to stop the run mid-stream", async () => { + const ac = new AbortController(); + ac.abort(); + const events: ConsensusEvent[] = []; + await expect( + runConsensus( + "Test", + [makeParticipant("p-1")], + opts({ rounds: 2, blindFirstRound: false }), + (e) => events.push(e), + ac.signal, + ), + ).rejects.toBeInstanceOf(DOMException); + }); + + // ── Internal helper tests ──────────────────────────────── + + it("shuffle returns a permutation without mutating input", () => { + const input = [1, 2, 3, 4, 5]; + const out = shuffle(input, () => 0.5); + expect(out).toHaveLength(5); + expect(out).not.toBe(input); + expect([...out].sort()).toEqual([1, 2, 3, 4, 5]); + }); + + it("calculateConsensusScore penalises high variance", () => { + const score = __testing.calculateConsensusScore([ + { participantId: "a", roundNumber: 1, content: "", confidence: 100, timestamp: 0 }, + { participantId: "b", roundNumber: 1, content: "", confidence: 0, timestamp: 0 }, + ]); + expect(score).toBeLessThan(50); + }); + + it("calculateConsensusScore is 0 for empty", () => { + expect(__testing.calculateConsensusScore([])).toBe(0); + }); + + it("extractConfidence falls back to 50 when missing", () => { + expect(__testing.extractConfidence("no marker here")).toBe(50); + expect(__testing.extractConfidence("CONFIDENCE: 92")).toBe(92); + expect(__testing.extractConfidence("CONFIDENCE: 150")).toBe(100); + }); + + it("extractJudgeSection picks out markdown sections", () => { + const md = `## Majority Position\nA wins.\n\n## Minority Positions\nB disagrees.\n\n## Unresolved Disputes\nNone.`; + expect(__testing.extractJudgeSection(md, "Majority Position")).toBe("A wins."); + expect(__testing.extractJudgeSection(md, "Minority Positions")).toBe("B disagrees."); + expect(__testing.extractJudgeSection(md, "Nothing Here")).toBe(""); + }); + + it("getRoundMeta labels the final synthesis round", () => { + expect(__testing.getRoundMeta(1, 5).type).toBe("initial-analysis"); + expect(__testing.getRoundMeta(5, 5).label).toContain("Final Synthesis"); + }); + + it("buildJudgeContext stringifies final responses", () => { + const ctx = __testing.buildJudgeContext( + [ + { + participantId: "p-1", + roundNumber: 1, + content: "Result A\nCONFIDENCE: 80", + confidence: 80, + timestamp: 0, + }, + ], + [ + { + id: "p-1", + modelInfo: { + id: "test:test-model", + providerId: "test", + providerName: "Test", + modelId: "test-model", + }, + persona: PERSONAS[0], + }, + ], + ); + expect(ctx).toContain("Result A"); + expect(ctx).toContain(PERSONAS[0].name); + }); + + it("detectDisagreements ignores pairs under the 20-point threshold", () => { + const out = detectDisagreements( + 1, + [ + { participantId: "a", roundNumber: 1, content: "", confidence: 70, timestamp: 0 }, + { participantId: "b", roundNumber: 1, content: "", confidence: 80, timestamp: 0 }, + ], + [], + ); + expect(out).toHaveLength(0); + }); + + it("detectDisagreements reports pairs above threshold", () => { + const participants = [ + { + id: "a", + modelInfo: { + id: "test:test-model", + providerId: "test", + providerName: "Test", + modelId: "test-model", + }, + persona: PERSONAS[0], + }, + { + id: "b", + modelInfo: { + id: "test:test-model", + providerId: "test", + providerName: "Test", + modelId: "test-model", + }, + persona: PERSONAS[1], + }, + ]; + const out = detectDisagreements( + 2, + [ + { participantId: "a", roundNumber: 2, content: "", confidence: 90, timestamp: 0 }, + { participantId: "b", roundNumber: 2, content: "", confidence: 50, timestamp: 0 }, + ], + participants, + ); + expect(out).toHaveLength(1); + expect(out[0].severity).toBe(40); + expect(out[0].label).toContain("vs"); }); }); diff --git a/tests/new-components.test.tsx b/tests/new-components.test.tsx new file mode 100644 index 0000000..bce378e --- /dev/null +++ b/tests/new-components.test.tsx @@ -0,0 +1,463 @@ +import { describe, it, expect, vi, beforeEach } from "vitest"; +import { render, screen, fireEvent, waitFor } from "@testing-library/react"; +import { useArenaStore, DEFAULT_OPTIONS } from "@/lib/store"; +import { PERSONAS } from "@/lib/personas"; +import type { ModelInfo } from "@/lib/types"; + +vi.mock("sonner", () => ({ + toast: { error: vi.fn(), info: vi.fn(), success: vi.fn() }, +})); +vi.mock("react-markdown", () => ({ + default: ({ children }: { children: string }) =>
{children}
, +})); +vi.mock("remark-gfm", () => ({ default: () => {} })); + +const openaiModel: ModelInfo = { + id: "openai:gpt-4o", + providerId: "openai", + providerName: "OpenAI", + modelId: "gpt-4o", +}; +const grokModel: ModelInfo = { + id: "grok:grok-3", + providerId: "grok", + providerName: "Grok", + modelId: "grok-3", +}; + +function resetStore() { + useArenaStore.getState().reset(); + useArenaStore.setState({ + availableModels: [openaiModel, grokModel], + modelsLoading: false, + participants: [], + prompt: "", + options: { ...DEFAULT_OPTIONS }, + }); +} + +describe("ConfidenceTrajectory", () => { + beforeEach(resetStore); + + it("renders nothing when no rounds have data", async () => { + const { default: Comp } = await import("@/components/ConfidenceTrajectory"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); + + it("draws one polyline per participant with data", async () => { + useArenaStore.setState({ + participants: [ + { id: "p-1", modelInfo: openaiModel, persona: PERSONAS[0] }, + { id: "p-2", modelInfo: grokModel, persona: PERSONAS[1] }, + ], + rounds: [ + { + number: 1, + type: "initial-analysis", + label: "R1", + consensusScore: 70, + responses: [ + { + participantId: "p-1", + roundNumber: 1, + content: "", + confidence: 80, + timestamp: 0, + }, + { + participantId: "p-2", + roundNumber: 1, + content: "", + confidence: 60, + timestamp: 0, + }, + ], + }, + { + number: 2, + type: "counterarguments", + label: "R2", + consensusScore: 75, + responses: [ + { + participantId: "p-1", + roundNumber: 2, + content: "", + confidence: 85, + timestamp: 0, + }, + { + participantId: "p-2", + roundNumber: 2, + content: "", + confidence: 70, + timestamp: 0, + }, + ], + }, + ], + }); + + const { default: Comp } = await import("@/components/ConfidenceTrajectory"); + const { container } = render(); + const paths = container.querySelectorAll("path"); + expect(paths.length).toBe(2); + expect(screen.getByText("Confidence Trajectory")).toBeInTheDocument(); + }); + + it("centers a single data point when only one round exists", async () => { + useArenaStore.setState({ + participants: [{ id: "p-1", modelInfo: openaiModel, persona: PERSONAS[0] }], + rounds: [ + { + number: 1, + type: "initial-analysis", + label: "R1", + consensusScore: 80, + responses: [ + { + participantId: "p-1", + roundNumber: 1, + content: "", + confidence: 80, + timestamp: 0, + }, + ], + }, + ], + }); + const { default: Comp } = await import("@/components/ConfidenceTrajectory"); + render(); + // With one point the value badge should still render + expect(screen.getByText("80%")).toBeInTheDocument(); + }); +}); + +describe("DisagreementPanel", () => { + beforeEach(resetStore); + + it("renders nothing when empty", async () => { + const { default: Comp } = await import("@/components/DisagreementPanel"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); + + it("groups items by round and scrolls on click", async () => { + const { default: Comp } = await import("@/components/DisagreementPanel"); + + useArenaStore.setState({ + participants: [ + { id: "p-1", modelInfo: openaiModel, persona: PERSONAS[0] }, + { id: "p-2", modelInfo: grokModel, persona: PERSONAS[1] }, + ], + disagreements: [ + { + id: "r1-a-b", + round: 1, + participantAId: "p-1", + participantBId: "p-2", + severity: 25, + label: "A vs B", + }, + { + id: "r2-a-b", + round: 2, + participantAId: "p-1", + participantBId: "p-2", + severity: 35, + label: "A vs B round 2", + }, + ], + }); + + // Create target element for scroll + const target = document.createElement("div"); + target.id = "round-1"; + target.scrollIntoView = vi.fn(); + document.body.appendChild(target); + + render(); + + expect(screen.getByText("Disagreement Ledger")).toBeInTheDocument(); + expect(screen.getByText("A vs B")).toBeInTheDocument(); + expect(screen.getByText("A vs B round 2")).toBeInTheDocument(); + + fireEvent.click(screen.getByText("A vs B")); + expect(target.scrollIntoView).toHaveBeenCalled(); + + document.body.removeChild(target); + }); +}); + +describe("CostMeter", () => { + beforeEach(resetStore); + + it("renders nothing when no tokens yet and not running", async () => { + const { default: Comp } = await import("@/components/CostMeter"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); + + it("shows formatted cost when tokens are present", async () => { + useArenaStore.setState({ + tokenTotal: { + inputTokens: 800, + outputTokens: 1200, + totalTokens: 2000, + estimatedCostUSD: 0.0123, + }, + }); + const { default: Comp } = await import("@/components/CostMeter"); + render(); + expect(screen.getByText("$0.01")).toBeInTheDocument(); + expect(screen.getByText(/2.0K tokens/)).toBeInTheDocument(); + }); + + it("shows 4-decimal precision for sub-cent totals", async () => { + useArenaStore.setState({ + tokenTotal: { + inputTokens: 1, + outputTokens: 1, + totalTokens: 2, + estimatedCostUSD: 0.00005, + }, + }); + const { default: Comp } = await import("@/components/CostMeter"); + render(); + expect(screen.getByText(/\$0\.0001/)).toBeInTheDocument(); + }); + + it("shows 2-decimal precision for >=1 cent totals", async () => { + useArenaStore.setState({ + tokenTotal: { + inputTokens: 1_200_000, + outputTokens: 800_000, + totalTokens: 2_000_000, + estimatedCostUSD: 1.23, + }, + }); + const { default: Comp } = await import("@/components/CostMeter"); + render(); + expect(screen.getByText("$1.23")).toBeInTheDocument(); + expect(screen.getByText(/2.00M tokens/)).toBeInTheDocument(); + }); + + it("shows running state even with zero tokens", async () => { + useArenaStore.setState({ isRunning: true }); + const { default: Comp } = await import("@/components/CostMeter"); + render(); + expect(screen.getByText(/Cost/)).toBeInTheDocument(); + }); +}); + +describe("JudgeCard", () => { + beforeEach(resetStore); + + it("renders nothing when no judge data", async () => { + const { default: Comp } = await import("@/components/JudgeCard"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); + + it("renders streaming content while running", async () => { + useArenaStore.setState({ + judgeRunning: true, + judgeStream: "Streaming thoughts...", + judge: { + modelId: "gpt-4o", + providerName: "OpenAI", + content: "", + majorityPosition: "", + minorityPositions: "", + unresolvedDisputes: "", + }, + }); + const { default: Comp } = await import("@/components/JudgeCard"); + render(); + expect(screen.getByText(/Streaming thoughts/)).toBeInTheDocument(); + expect(screen.getByText("Consensus Judge")).toBeInTheDocument(); + }); + + it("renders final judge content and strips JUDGE_CONFIDENCE line", async () => { + useArenaStore.setState({ + judgeRunning: false, + judge: { + modelId: "gpt-4o", + providerName: "OpenAI", + content: "## Majority Position\nAll good.\nJUDGE_CONFIDENCE: 82", + majorityPosition: "All good.", + minorityPositions: "", + unresolvedDisputes: "", + }, + }); + const { default: Comp } = await import("@/components/JudgeCard"); + render(); + const md = screen.getByTestId("md"); + expect(md.textContent).not.toContain("JUDGE_CONFIDENCE"); + expect(md.textContent).toContain("All good"); + }); +}); + +describe("PromptLibrary", () => { + beforeEach(resetStore); + + it("renders preset chips when prompt is empty", async () => { + const { default: Comp } = await import("@/components/PromptLibrary"); + render(); + expect(screen.getByText("Try a preset")).toBeInTheDocument(); + expect(screen.getAllByText(/Engineering|Strategy|Science|Ethics/).length).toBeGreaterThan(0); + }); + + it("setting a preset fills the prompt in the store", async () => { + const { default: Comp } = await import("@/components/PromptLibrary"); + render(); + const first = screen.getAllByRole("button")[0]; + fireEvent.click(first); + expect(useArenaStore.getState().prompt.length).toBeGreaterThan(10); + }); + + it("renders nothing when a prompt is already set", async () => { + useArenaStore.getState().setPrompt("hello"); + const { default: Comp } = await import("@/components/PromptLibrary"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); + + it("renders nothing while running", async () => { + useArenaStore.setState({ isRunning: true }); + const { default: Comp } = await import("@/components/PromptLibrary"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); +}); + +describe("ConfigPanel", () => { + beforeEach(resetStore); + + it("renders engine picker and toggles", async () => { + const { default: Comp } = await import("@/components/ConfigPanel"); + render(); + expect(screen.getByText("CVP")).toBeInTheDocument(); + expect(screen.getByText("Blind Jury")).toBeInTheDocument(); + expect(screen.getByText("Randomize order")).toBeInTheDocument(); + expect(screen.getByText("Blind Round 1")).toBeInTheDocument(); + expect(screen.getByText("Early stop")).toBeInTheDocument(); + }); + + it("switches to Blind Jury and hides the CVP toggles", async () => { + const { default: Comp } = await import("@/components/ConfigPanel"); + render(); + fireEvent.click(screen.getByText("Blind Jury")); + expect(useArenaStore.getState().options.engine).toBe("blind-jury"); + // CVP toggles should no longer be on screen + expect(screen.queryByText("Randomize order")).not.toBeInTheDocument(); + }); + + it("toggling randomize order flips the option", async () => { + const { default: Comp } = await import("@/components/ConfigPanel"); + render(); + const before = useArenaStore.getState().options.randomizeOrder; + fireEvent.click(screen.getByText("Randomize order")); + expect(useArenaStore.getState().options.randomizeOrder).toBe(!before); + }); + + it("enabling judge without a model auto-selects the first available", async () => { + const { default: Comp } = await import("@/components/ConfigPanel"); + render(); + fireEvent.click(screen.getByText("Judge synthesis")); + expect(useArenaStore.getState().options.judgeEnabled).toBe(true); + expect(useArenaStore.getState().options.judgeModelId).toBe(openaiModel.id); + }); + + it("judge model dropdown lets you swap models", async () => { + useArenaStore.getState().setOption("judgeEnabled", true); + useArenaStore.getState().setOption("judgeModelId", openaiModel.id); + const { default: Comp } = await import("@/components/ConfigPanel"); + render(); + // Open dropdown via the judge model button + const btn = screen.getByText(openaiModel.modelId).closest("button"); + if (btn) fireEvent.click(btn); + // Click the other model + fireEvent.click(screen.getByText(grokModel.modelId)); + expect(useArenaStore.getState().options.judgeModelId).toBe(grokModel.id); + }); + + it("closes the judge dropdown on outside click", async () => { + useArenaStore.getState().setOption("judgeEnabled", true); + const { default: Comp } = await import("@/components/ConfigPanel"); + render(); + const btn = screen.getByText(/Select judge model|gpt-4o|grok-3/).closest("button"); + if (btn) fireEvent.click(btn); + fireEvent.mouseDown(document.body); + // After outside click, no OpenAI list item should still be open (it was inside the dropdown) + // This at least hits the outside-click branch. + expect(btn).toBeInTheDocument(); + }); +}); + +describe("SessionMenu", () => { + beforeEach(() => { + resetStore(); + Object.assign(navigator, { + clipboard: { writeText: vi.fn().mockResolvedValue(undefined) }, + }); + URL.createObjectURL = vi.fn(() => "blob:test"); + URL.revokeObjectURL = vi.fn(); + }); + + it("renders nothing when no final score is set", async () => { + const { default: Comp } = await import("@/components/SessionMenu"); + const { container } = render(); + expect(container.innerHTML).toBe(""); + }); + + it("opens the menu and offers markdown/json/permalink actions", async () => { + useArenaStore.setState({ finalScore: 80, prompt: "hi" }); + const { default: Comp } = await import("@/components/SessionMenu"); + render(); + fireEvent.click(screen.getByText("Export")); + expect(screen.getByText("Download Markdown")).toBeInTheDocument(); + expect(screen.getByText("Download JSON")).toBeInTheDocument(); + expect(screen.getByText("Copy permalink")).toBeInTheDocument(); + }); + + it("downloads markdown when clicked", async () => { + useArenaStore.setState({ finalScore: 80, prompt: "hi" }); + const clickMock = vi.fn(); + const originalCreate = document.createElement.bind(document); + vi.spyOn(document, "createElement").mockImplementation((tag: string) => { + const el = originalCreate(tag) as HTMLAnchorElement; + if (tag === "a") el.click = clickMock; + return el; + }); + const { default: Comp } = await import("@/components/SessionMenu"); + render(); + fireEvent.click(screen.getByText("Export")); + fireEvent.click(screen.getByText("Download Markdown")); + expect(clickMock).toHaveBeenCalled(); + }); + + it("copies a permalink to the clipboard", async () => { + useArenaStore.setState({ finalScore: 80, prompt: "hi" }); + const { default: Comp } = await import("@/components/SessionMenu"); + render(); + fireEvent.click(screen.getByText("Export")); + fireEvent.click(screen.getByText("Copy permalink")); + await waitFor(() => { + expect(navigator.clipboard.writeText).toHaveBeenCalled(); + }); + const url = (navigator.clipboard.writeText as ReturnType).mock.calls[0][0]; + expect(url).toContain("#rt="); + }); + + it("closes the menu on outside click", async () => { + useArenaStore.setState({ finalScore: 80 }); + const { default: Comp } = await import("@/components/SessionMenu"); + render(); + fireEvent.click(screen.getByText("Export")); + expect(screen.getByText("Download Markdown")).toBeInTheDocument(); + fireEvent.mouseDown(document.body); + expect(screen.queryByText("Download Markdown")).not.toBeInTheDocument(); + }); +}); diff --git a/tests/page-consensus.test.tsx b/tests/page-consensus.test.tsx index a5ac019..881852b 100644 --- a/tests/page-consensus.test.tsx +++ b/tests/page-consensus.test.tsx @@ -1,6 +1,6 @@ import { describe, it, expect, vi, beforeEach } from "vitest"; import { render, screen, fireEvent, waitFor, act } from "@testing-library/react"; -import { useArenaStore } from "@/lib/store"; +import { useArenaStore, DEFAULT_OPTIONS } from "@/lib/store"; import { PERSONAS } from "@/lib/personas"; import type { ModelInfo } from "@/lib/types"; @@ -25,7 +25,7 @@ describe("HomePage — consensus execution", () => { availableModels: [model], modelsLoading: false, participants: [], - roundCount: 2, + options: { ...DEFAULT_OPTIONS, rounds: 2 }, prompt: "", }); @@ -233,4 +233,203 @@ describe("HomePage — consensus execution", () => { // but the coverage for the early return is hit by the SSE test above expect(useArenaStore.getState().rounds).toEqual([]); }); + + it("drives every new SSE event through processEvent", async () => { + const { default: HomePage } = await import("@/app/page"); + + useArenaStore.setState({ + participants: [ + { id: "p-1", modelInfo: model, persona: PERSONAS[0] }, + { id: "p-2", modelInfo: model, persona: PERSONAS[1] }, + ], + prompt: "drive all events", + }); + + const sse = [ + 'data: {"type":"round-start","round":1,"roundType":"initial-analysis","label":"Initial"}', + 'data: {"type":"participant-start","participantId":"p-1","round":1}', + 'data: {"type":"token","participantId":"p-1","round":1,"token":"Hello"}', + 'data: {"type":"participant-end","participantId":"p-1","round":1,"confidence":80,"fullContent":"Hello\\nCONFIDENCE: 80","durationMs":50,"usage":{"inputTokens":50,"outputTokens":20,"totalTokens":70,"estimatedCostUSD":0.001}}', + 'data: {"type":"round-end","round":1,"consensusScore":72}', + 'data: {"type":"disagreements","round":1,"disagreements":[{"id":"r1-p-1-p-2","round":1,"participantAId":"p-1","participantBId":"p-2","severity":25,"label":"A vs B"}]}', + 'data: {"type":"early-stop","round":1,"delta":2,"reason":"converged"}', + 'data: {"type":"judge-start","modelId":"gpt-4o","providerName":"OpenAI"}', + 'data: {"type":"judge-token","token":"synth"}', + 'data: {"type":"judge-end","result":{"modelId":"gpt-4o","providerName":"OpenAI","content":"## Majority Position\\nA.\\nJUDGE_CONFIDENCE: 88","majorityPosition":"A.","minorityPositions":"","unresolvedDisputes":""}}', + 'data: {"type":"consensus-complete","finalScore":72,"summary":"Done","roundsCompleted":1}', + "", + ].join("\n\n"); + + const encoder = new TextEncoder(); + const stream = new ReadableStream({ + start(controller) { + controller.enqueue(encoder.encode(sse)); + controller.close(); + }, + }); + mockFetch.mockImplementation((url: string) => { + if (url === "/api/providers") { + return Promise.resolve({ ok: true, json: () => Promise.resolve({ models: [model] }) }); + } + return Promise.resolve({ ok: true, body: stream }); + }); + + render(); + await act(async () => { + fireEvent.click(screen.getByText("Run Consensus").closest("button")!); + }); + + await waitFor(() => { + const s = useArenaStore.getState(); + expect(s.finalScore).toBe(72); + expect(s.disagreements).toHaveLength(1); + expect(s.earlyStopped?.round).toBe(1); + expect(s.judge?.content).toContain("Majority"); + }); + }); + + it("shows a participant-level error toast and stores the error on the response", async () => { + const { toast } = await import("sonner"); + const { default: HomePage } = await import("@/app/page"); + + useArenaStore.setState({ + participants: [ + { id: "p-1", modelInfo: model, persona: PERSONAS[0] }, + { id: "p-2", modelInfo: model, persona: PERSONAS[1] }, + ], + prompt: "participant error test", + }); + + const sse = [ + 'data: {"type":"round-start","round":1,"roundType":"initial-analysis","label":"Analysis"}', + 'data: {"type":"participant-start","participantId":"p-1","round":1}', + 'data: {"type":"token","participantId":"p-1","round":1,"token":"[Error from T / m: Not Found — HTTP 404]"}', + 'data: {"type":"participant-end","participantId":"p-1","round":1,"confidence":0,"fullContent":"[Error from T / m: Not Found — HTTP 404]","durationMs":20,"error":"Not Found — HTTP 404"}', + 'data: {"type":"round-end","round":1,"consensusScore":0}', + 'data: {"type":"consensus-complete","finalScore":0,"summary":"done","roundsCompleted":1}', + "", + ].join("\n\n"); + + const encoder = new TextEncoder(); + const stream = new ReadableStream({ + start(controller) { + controller.enqueue(encoder.encode(sse)); + controller.close(); + }, + }); + mockFetch.mockImplementation((url: string) => { + if (url === "/api/providers") { + return Promise.resolve({ ok: true, json: () => Promise.resolve({ models: [model] }) }); + } + return Promise.resolve({ ok: true, body: stream }); + }); + + render(); + await act(async () => { + fireEvent.click(screen.getByText("Run Consensus").closest("button")!); + }); + + await waitFor(() => { + expect(toast.error).toHaveBeenCalledWith(expect.stringContaining("Not Found")); + const s = useArenaStore.getState(); + const response = s.rounds[0]?.responses.find((r) => r.participantId === "p-1"); + expect(response?.error).toBe("Not Found — HTTP 404"); + }); + }); + + it("surfaces an `error` SSE event via toast and completes with 0", async () => { + const { toast } = await import("sonner"); + const { default: HomePage } = await import("@/app/page"); + + useArenaStore.setState({ + participants: [ + { id: "p-1", modelInfo: model, persona: PERSONAS[0] }, + { id: "p-2", modelInfo: model, persona: PERSONAS[1] }, + ], + prompt: "error test", + }); + + const encoder = new TextEncoder(); + const stream = new ReadableStream({ + start(controller) { + controller.enqueue(encoder.encode('data: {"type":"error","message":"boom"}\n\n')); + controller.close(); + }, + }); + + mockFetch.mockImplementation((url: string) => { + if (url === "/api/providers") { + return Promise.resolve({ ok: true, json: () => Promise.resolve({ models: [model] }) }); + } + return Promise.resolve({ ok: true, body: stream }); + }); + + render(); + await act(async () => { + fireEvent.click(screen.getByText("Run Consensus").closest("button")!); + }); + + await waitFor(() => { + expect(toast.error).toHaveBeenCalledWith("boom"); + }); + }); + + it("hydrates from a #rt= shared hash on mount", async () => { + const { decodeSnapshotFromHash: _decode, encodeSnapshotToHash } = await import("@/lib/session"); + const snap = { + v: 1 as const, + prompt: "from hash", + engine: "cvp" as const, + options: { ...DEFAULT_OPTIONS, rounds: 2 }, + participants: [{ id: "p-1", modelInfo: model, persona: PERSONAS[0] }], + rounds: [ + { + number: 1, + type: "initial-analysis" as const, + label: "I", + consensusScore: 50, + responses: [], + }, + ], + finalScore: 50, + finalSummary: "hi", + judge: null, + disagreements: [], + tokenTotal: null, + createdAt: Date.now(), + }; + const hash = await encodeSnapshotToHash(snap); + window.history.replaceState(null, "", `/#${hash}`); + + const { default: HomePage } = await import("@/app/page"); + render(); + + await waitFor(() => { + const s = useArenaStore.getState(); + expect(s.prompt).toBe("from hash"); + expect(s.sharedView).toBe(true); + }); + window.history.replaceState(null, "", "/"); + }); + + it("blocks Run when judge is enabled but no judge model is selected", async () => { + const { toast } = await import("sonner"); + const { default: HomePage } = await import("@/app/page"); + + useArenaStore.setState({ + participants: [ + { id: "p-1", modelInfo: model, persona: PERSONAS[0] }, + { id: "p-2", modelInfo: model, persona: PERSONAS[1] }, + ], + prompt: "needs judge", + options: { ...DEFAULT_OPTIONS, judgeEnabled: true, judgeModelId: undefined }, + }); + + render(); + await act(async () => { + fireEvent.click(screen.getByText("Run Consensus").closest("button")!); + }); + + expect(toast.error).toHaveBeenCalledWith(expect.stringContaining("judge")); + }); }); diff --git a/tests/page.test.tsx b/tests/page.test.tsx index 007cee2..86465df 100644 --- a/tests/page.test.tsx +++ b/tests/page.test.tsx @@ -1,6 +1,6 @@ import { describe, it, expect, vi, beforeEach } from "vitest"; import { render, screen, fireEvent, waitFor } from "@testing-library/react"; -import { useArenaStore } from "@/lib/store"; +import { useArenaStore, DEFAULT_OPTIONS } from "@/lib/store"; // Mock sonner vi.mock("sonner", () => ({ @@ -25,7 +25,7 @@ describe("HomePage", () => { availableModels: [], modelsLoading: true, participants: [], - roundCount: 5, + options: { ...DEFAULT_OPTIONS }, prompt: "", }); @@ -105,11 +105,11 @@ describe("HomePage", () => { if (plusBtn) { fireEvent.click(plusBtn); - expect(useArenaStore.getState().roundCount).toBe(6); + expect(useArenaStore.getState().options.rounds).toBe(6); } if (minusBtn) { fireEvent.click(minusBtn); - expect(useArenaStore.getState().roundCount).toBe(5); + expect(useArenaStore.getState().options.rounds).toBe(5); } }); diff --git a/tests/pricing.test.ts b/tests/pricing.test.ts new file mode 100644 index 0000000..f2d3f1a --- /dev/null +++ b/tests/pricing.test.ts @@ -0,0 +1,64 @@ +import { describe, it, expect } from "vitest"; +import { + getModelPricing, + estimateCost, + addUsage, + estimateUsageFromText, + ZERO_USAGE, + PRICING_TABLE, + ZERO_PRICING, +} from "@/lib/pricing"; + +describe("pricing", () => { + it("returns ZERO_PRICING for unknown models", () => { + expect(getModelPricing("some-random-model-xyz")).toEqual(ZERO_PRICING); + }); + + it("matches a simple model by inclusion", () => { + expect(getModelPricing("gpt-4o").output).toBe(PRICING_TABLE["gpt-4o"].output); + }); + + it("prefers the longest matching prefix", () => { + // gpt-4o-mini should win over gpt-4o for `gpt-4o-mini-2024` + const price = getModelPricing("gpt-4o-mini-2024-07-18"); + expect(price).toEqual(PRICING_TABLE["gpt-4o-mini"]); + }); + + it("resolves dated claude variants", () => { + expect(getModelPricing("claude-sonnet-4-20250514")).toEqual(PRICING_TABLE["claude-sonnet-4"]); + }); + + it("estimateCost scales with tokens", () => { + const cost = estimateCost("gpt-4o", 1_000_000, 1_000_000); + expect(cost).toBeCloseTo(PRICING_TABLE["gpt-4o"].input + PRICING_TABLE["gpt-4o"].output); + }); + + it("estimateCost returns 0 for unknown models", () => { + expect(estimateCost("zzz", 1000, 1000)).toBe(0); + }); + + it("addUsage is associative-ish and non-mutating", () => { + const a = { inputTokens: 1, outputTokens: 2, totalTokens: 3, estimatedCostUSD: 0.001 }; + const b = { inputTokens: 4, outputTokens: 5, totalTokens: 9, estimatedCostUSD: 0.002 }; + const sum = addUsage(a, b); + expect(sum).toEqual({ + inputTokens: 5, + outputTokens: 7, + totalTokens: 12, + estimatedCostUSD: 0.003, + }); + // Originals untouched + expect(a.inputTokens).toBe(1); + }); + + it("estimateUsageFromText uses a 4-chars-per-token heuristic", () => { + const usage = estimateUsageFromText("gpt-4o", "1234", "1234567890123456"); // 4 in, 16 out + expect(usage.inputTokens).toBe(1); + expect(usage.outputTokens).toBe(4); + expect(usage.totalTokens).toBe(5); + }); + + it("ZERO_USAGE is well-defined", () => { + expect(ZERO_USAGE.totalTokens).toBe(0); + }); +}); diff --git a/tests/providers.test.ts b/tests/providers.test.ts index 53de7e5..d75d45e 100644 --- a/tests/providers.test.ts +++ b/tests/providers.test.ts @@ -255,7 +255,7 @@ describe("providers", () => { expect(models[0].id).toBe("grok:grok-3"); expect(models[0].preferred).toBe(true); // No apiKey exposed - expect((models[0] as Record)["apiKey"]).toBeUndefined(); + expect((models[0] as unknown as Record)["apiKey"]).toBeUndefined(); }); }); diff --git a/tests/session.test.ts b/tests/session.test.ts new file mode 100644 index 0000000..90e07d0 --- /dev/null +++ b/tests/session.test.ts @@ -0,0 +1,227 @@ +import { describe, it, expect, beforeEach, vi } from "vitest"; +import type { SessionSnapshot } from "@/lib/types"; +import { PERSONAS } from "@/lib/personas"; +import { + encodeSnapshotToHash, + decodeSnapshotFromHash, + snapshotToMarkdown, + snapshotToJSON, + snapshotFilename, + downloadBlob, +} from "@/lib/session"; + +const baseSnapshot: SessionSnapshot = { + v: 1, + prompt: "Should we ship on Friday?", + engine: "cvp", + options: { + engine: "cvp", + rounds: 3, + randomizeOrder: true, + blindFirstRound: true, + earlyStop: true, + judgeEnabled: true, + judgeModelId: "openai:gpt-4o", + }, + participants: [ + { + id: "p-1", + modelInfo: { + id: "openai:gpt-4o", + providerId: "openai", + providerName: "OpenAI", + modelId: "gpt-4o", + }, + persona: PERSONAS[0], + }, + { + id: "p-2", + modelInfo: { + id: "grok:grok-3", + providerId: "grok", + providerName: "Grok", + modelId: "grok-3", + }, + persona: PERSONAS[1], + }, + ], + rounds: [ + { + number: 1, + type: "initial-analysis", + label: "Initial Analysis", + consensusScore: 78, + responses: [ + { + participantId: "p-1", + roundNumber: 1, + content: "Yes, ship it.\nCONFIDENCE: 85", + confidence: 85, + timestamp: 123, + }, + { + participantId: "p-2", + roundNumber: 1, + content: "No, wait.\nCONFIDENCE: 60", + confidence: 60, + timestamp: 124, + }, + ], + }, + ], + finalScore: 78, + finalSummary: "done", + judge: { + modelId: "gpt-4o", + providerName: "OpenAI", + content: "## Majority Position\nShip.", + majorityPosition: "Ship.", + minorityPositions: "Wait.", + unresolvedDisputes: "None", + }, + disagreements: [ + { + id: "r1-p-1-p-2", + round: 1, + participantAId: "p-1", + participantBId: "p-2", + severity: 25, + label: "Risk vs Engineer", + }, + ], + tokenTotal: { + inputTokens: 1000, + outputTokens: 500, + totalTokens: 1500, + estimatedCostUSD: 0.02, + }, + createdAt: 1700000000000, +}; + +describe("session — markdown & json exports", () => { + it("includes prompt, engine, round headings and participants in markdown", () => { + const md = snapshotToMarkdown(baseSnapshot); + expect(md).toContain("Should we ship on Friday?"); + expect(md).toContain("## Round 1 — Initial Analysis"); + expect(md).toContain(PERSONAS[0].name); + expect(md).toContain("Judge Synthesis"); + expect(md).toContain("Disagreements"); + expect(md).toContain("$0.0200"); + }); + + it("handles snapshots without judge/disagreements/cost", () => { + const minimal: SessionSnapshot = { + ...baseSnapshot, + judge: null, + disagreements: [], + tokenTotal: null, + finalScore: null, + }; + const md = snapshotToMarkdown(minimal); + expect(md).not.toContain("Judge Synthesis"); + expect(md).not.toContain("Disagreements"); + expect(md).not.toContain("Total cost"); + }); + + it("snapshotToJSON round-trips via JSON.parse", () => { + const json = snapshotToJSON(baseSnapshot); + const parsed = JSON.parse(json); + expect(parsed.prompt).toBe(baseSnapshot.prompt); + expect(parsed.rounds[0].responses).toHaveLength(2); + }); + + it("snapshotFilename is slug-like and includes the extension", () => { + const f = snapshotFilename(baseSnapshot, "md"); + expect(f).toMatch(/^roundtable-should-we-ship-on-friday-/); + expect(f.endsWith(".md")).toBe(true); + }); + + it("snapshotFilename falls back to `session` for non-alnum prompt", () => { + const f = snapshotFilename({ ...baseSnapshot, prompt: "??????" }, "json"); + expect(f).toContain("session"); + }); +}); + +describe("session — hash encode/decode", () => { + it("round-trips a full snapshot through the URL hash", async () => { + const encoded = await encodeSnapshotToHash(baseSnapshot); + expect(encoded.startsWith("rt=")).toBe(true); + + const decoded = await decodeSnapshotFromHash(`#${encoded}`); + expect(decoded).not.toBeNull(); + expect(decoded?.prompt).toBe(baseSnapshot.prompt); + expect(decoded?.rounds[0].responses).toHaveLength(2); + }); + + it("round-trips without compression when CompressionStream is absent", async () => { + const g = globalThis as unknown as { + CompressionStream?: unknown; + DecompressionStream?: unknown; + }; + const origCompression = g.CompressionStream; + const origDecompression = g.DecompressionStream; + g.CompressionStream = undefined; + g.DecompressionStream = undefined; + try { + const encoded = await encodeSnapshotToHash(baseSnapshot); + expect(encoded.startsWith("rt=r")).toBe(true); // raw marker + const decoded = await decodeSnapshotFromHash(`#${encoded}`); + expect(decoded?.prompt).toBe(baseSnapshot.prompt); + } finally { + g.CompressionStream = origCompression; + g.DecompressionStream = origDecompression; + } + }); + + it("decodeSnapshotFromHash returns null for junk", async () => { + expect(await decodeSnapshotFromHash("#nothing")).toBeNull(); + expect(await decodeSnapshotFromHash("#rt=garbage!")).toBeNull(); + expect(await decodeSnapshotFromHash("")).toBeNull(); + }); + + it("decodeSnapshotFromHash rejects wrong version", async () => { + const fake = { ...baseSnapshot, v: 99 }; + // Manually encode as raw + const b64 = Buffer.from(JSON.stringify(fake), "utf-8") + .toString("base64") + .replace(/\+/g, "-") + .replace(/\//g, "_") + .replace(/=+$/, ""); + const decoded = await decodeSnapshotFromHash(`#rt=r${b64}`); + expect(decoded).toBeNull(); + }); +}); + +describe("session — downloadBlob", () => { + beforeEach(() => { + vi.restoreAllMocks(); + }); + + it("creates an anchor, clicks, and revokes the object URL", () => { + const created: HTMLAnchorElement[] = []; + const originalCreate = document.createElement.bind(document); + vi.spyOn(document, "createElement").mockImplementation((tag: string) => { + const el = originalCreate(tag) as HTMLAnchorElement; + if (tag === "a") { + el.click = vi.fn(); + created.push(el); + } + return el; + }); + + const originalCreateObjectURL = URL.createObjectURL; + const originalRevokeObjectURL = URL.revokeObjectURL; + URL.createObjectURL = vi.fn(() => "blob:test"); + URL.revokeObjectURL = vi.fn(); + + downloadBlob("test.md", "# hello", "text/markdown"); + + expect(URL.createObjectURL).toHaveBeenCalled(); + expect(URL.revokeObjectURL).toHaveBeenCalledWith("blob:test"); + expect(created.length).toBe(1); + expect(created[0].click).toHaveBeenCalled(); + + URL.createObjectURL = originalCreateObjectURL; + URL.revokeObjectURL = originalRevokeObjectURL; + }); +}); diff --git a/tests/store.test.ts b/tests/store.test.ts index 1d3853f..c30bd13 100644 --- a/tests/store.test.ts +++ b/tests/store.test.ts @@ -1,7 +1,7 @@ import { describe, it, expect, beforeEach } from "vitest"; -import { useArenaStore } from "@/lib/store"; +import { useArenaStore, DEFAULT_OPTIONS } from "@/lib/store"; import { PERSONAS } from "@/lib/personas"; -import type { ModelInfo } from "@/lib/types"; +import type { ModelInfo, SessionSnapshot } from "@/lib/types"; const mockModel: ModelInfo = { id: "test:model-1", @@ -27,8 +27,8 @@ describe("ArenaStore", () => { availableModels: [], modelsLoading: true, participants: [], - roundCount: 5, prompt: "", + options: { ...DEFAULT_OPTIONS }, }); }); @@ -79,19 +79,28 @@ describe("ArenaStore", () => { describe("configuration", () => { it("sets round count clamped between 1 and 10", () => { useArenaStore.getState().setRoundCount(7); - expect(useArenaStore.getState().roundCount).toBe(7); + expect(useArenaStore.getState().options.rounds).toBe(7); useArenaStore.getState().setRoundCount(0); - expect(useArenaStore.getState().roundCount).toBe(1); + expect(useArenaStore.getState().options.rounds).toBe(1); useArenaStore.getState().setRoundCount(15); - expect(useArenaStore.getState().roundCount).toBe(10); + expect(useArenaStore.getState().options.rounds).toBe(10); }); it("sets prompt", () => { useArenaStore.getState().setPrompt("test prompt"); expect(useArenaStore.getState().prompt).toBe("test prompt"); }); + + it("setOption toggles individual engine options", () => { + useArenaStore.getState().setOption("engine", "blind-jury"); + expect(useArenaStore.getState().options.engine).toBe("blind-jury"); + useArenaStore.getState().setOption("randomizeOrder", false); + expect(useArenaStore.getState().options.randomizeOrder).toBe(false); + useArenaStore.getState().setOption("judgeModelId", "foo:bar"); + expect(useArenaStore.getState().options.judgeModelId).toBe("foo:bar"); + }); }); describe("consensus lifecycle", () => { @@ -107,6 +116,8 @@ describe("ArenaStore", () => { expect(s.activeStreams).toEqual({}); expect(s.finalScore).toBeNull(); expect(s.progress).toBe(0); + expect(s.tokenTotal.totalTokens).toBe(0); + expect(s.disagreements).toEqual([]); }); it("cancelConsensus aborts and sets isRunning false", () => { @@ -138,33 +149,42 @@ describe("ArenaStore", () => { useArenaStore.getState().appendToken("p-1", 1, "streaming"); useArenaStore .getState() - .completeParticipantRound("p-1", 1, 75, "Full response\nCONFIDENCE: 75"); + .completeParticipantRound("p-1", 1, 75, "Full response\nCONFIDENCE: 75", { + inputTokens: 100, + outputTokens: 50, + totalTokens: 150, + estimatedCostUSD: 0.001, + }); const s = useArenaStore.getState(); expect(s.activeStreams["p-1"]).toBe(""); expect(s.rounds[0].responses).toHaveLength(1); expect(s.rounds[0].responses[0].confidence).toBe(75); + expect(s.tokenTotal.totalTokens).toBe(150); + expect(s.usageByParticipant["p-1"].totalTokens).toBe(150); }); it("endRound sets consensus score and updates progress", () => { + useArenaStore.getState().setOption("rounds", 5); useArenaStore.getState().startConsensus(); useArenaStore.getState().startRound(1, "initial-analysis", "Analysis"); useArenaStore.getState().endRound(1, 82); const s = useArenaStore.getState(); expect(s.rounds[0].consensusScore).toBe(82); - expect(s.progress).toBe(1 / 5); // round 1 of 5 + expect(s.progress).toBe(1 / 5); }); it("completeConsensus finalizes state", () => { useArenaStore.getState().startConsensus(); - useArenaStore.getState().completeConsensus(88, "Good consensus"); + useArenaStore.getState().completeConsensus(88, "Good consensus", 5); const s = useArenaStore.getState(); expect(s.isRunning).toBe(false); expect(s.finalScore).toBe(88); expect(s.finalSummary).toBe("Good consensus"); expect(s.progress).toBe(1); + expect(s.roundsCompleted).toBe(5); }); it("reset clears all execution state", () => { @@ -177,6 +197,95 @@ describe("ArenaStore", () => { expect(s.rounds).toEqual([]); expect(s.progress).toBe(0); expect(s.finalScore).toBeNull(); + expect(s.disagreements).toEqual([]); + expect(s.judge).toBeNull(); + }); + }); + + describe("disagreements, judge, early stop", () => { + it("addDisagreements appends items", () => { + useArenaStore.getState().addDisagreements(1, [ + { + id: "r1-a-b", + round: 1, + participantAId: "a", + participantBId: "b", + severity: 30, + label: "x", + }, + ]); + expect(useArenaStore.getState().disagreements).toHaveLength(1); + }); + + it("startJudge seeds a judge record with empty content", () => { + useArenaStore.getState().startJudge("openai:gpt-4o", "OpenAI"); + const s = useArenaStore.getState(); + expect(s.judgeRunning).toBe(true); + expect(s.judge?.modelId).toBe("openai:gpt-4o"); + expect(s.judge?.providerName).toBe("OpenAI"); + }); + + it("appendJudgeToken accumulates streamed judge content", () => { + useArenaStore.getState().startJudge("x", "X"); + useArenaStore.getState().appendJudgeToken("Hello "); + useArenaStore.getState().appendJudgeToken("world"); + expect(useArenaStore.getState().judgeStream).toBe("Hello world"); + }); + + it("completeJudge stores the final result and adds its usage to the total", () => { + useArenaStore.getState().startJudge("x", "X"); + useArenaStore.getState().completeJudge({ + modelId: "x", + providerName: "X", + content: "final", + majorityPosition: "A", + minorityPositions: "B", + unresolvedDisputes: "", + usage: { inputTokens: 10, outputTokens: 20, totalTokens: 30, estimatedCostUSD: 0.0001 }, + }); + const s = useArenaStore.getState(); + expect(s.judgeRunning).toBe(false); + expect(s.judge?.content).toBe("final"); + expect(s.tokenTotal.totalTokens).toBe(30); + }); + + it("setEarlyStopped records the info", () => { + useArenaStore.getState().setEarlyStopped({ round: 3, delta: 1, reason: "stable" }); + expect(useArenaStore.getState().earlyStopped?.round).toBe(3); + }); + }); + + describe("snapshot load / getSnapshot", () => { + it("getSnapshot returns current state shape", () => { + useArenaStore.getState().setPrompt("hello"); + const snap = useArenaStore.getState().getSnapshot(); + expect(snap.v).toBe(1); + expect(snap.prompt).toBe("hello"); + }); + + it("loadSnapshot rehydrates and sets sharedView", () => { + const snap: SessionSnapshot = { + v: 1, + prompt: "shared prompt", + engine: "cvp", + options: { ...DEFAULT_OPTIONS, rounds: 3 }, + participants: [{ id: "p-1", modelInfo: mockModel, persona }], + rounds: [ + { number: 1, type: "initial-analysis", label: "A", responses: [], consensusScore: 75 }, + ], + finalScore: 75, + finalSummary: "done", + judge: null, + disagreements: [], + tokenTotal: null, + createdAt: Date.now(), + }; + useArenaStore.getState().loadSnapshot(snap); + const s = useArenaStore.getState(); + expect(s.prompt).toBe("shared prompt"); + expect(s.sharedView).toBe(true); + expect(s.finalScore).toBe(75); + expect(s.rounds).toHaveLength(1); }); }); });