diff --git a/.github/workflows/ai-eval.yml b/.github/workflows/ai-eval.yml new file mode 100644 index 00000000..7b0e0a92 --- /dev/null +++ b/.github/workflows/ai-eval.yml @@ -0,0 +1,52 @@ +name: AI Eval + +on: + schedule: + - cron: '0 8 * * *' + push: + paths: + - 'ai-evals/**' + pull_request: + paths: + - 'ai-evals/**' + +jobs: + ai-eval: + runs-on: ubuntu-latest + continue-on-error: true + + steps: + - name: Checkout code + uses: actions/checkout@v4 + + - name: Setup Node.js 22 + uses: actions/setup-node@v4 + with: + node-version: 22 + cache: 'npm' + + - name: Install dependencies + run: npm install + + - name: Install Claude Code + run: npm install -g @anthropic-ai/claude-code + + - name: Check Claude authentication + id: claude-auth + run: node scripts/check-claude-ai-eval-gate.js + env: + CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + + - name: Run AI prompt evaluations + if: steps.claude-auth.outputs.available == 'true' + env: + CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} + run: npm run test:ai-eval + + - name: Upload AI eval responses + if: always() && steps.claude-auth.outputs.available == 'true' + uses: actions/upload-artifact@v4 + with: + name: ai-eval-responses + path: ai-evals/*.responses.md + retention-days: 14 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 75008374..27a96b30 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -32,43 +32,3 @@ jobs: - name: Run tests run: npm test - - ai-eval: - runs-on: ubuntu-latest - needs: test - - steps: - - name: Checkout code - uses: actions/checkout@v4 - - - name: Setup Node.js 22 - uses: actions/setup-node@v4 - with: - node-version: 22 - cache: 'npm' - - - name: Install dependencies - run: npm install - - - name: Install Claude Code - run: npm install -g @anthropic-ai/claude-code - - - name: Check Claude authentication - id: claude-auth - run: node scripts/check-claude-ai-eval-gate.js - env: - CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} - - - name: Run AI prompt evaluations - if: steps.claude-auth.outputs.available == 'true' - env: - CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} - run: npm run test:ai-eval - - - name: Upload AI eval responses - if: always() && steps.claude-auth.outputs.available == 'true' - uses: actions/upload-artifact@v4 - with: - name: ai-eval-responses - path: ai-evals/*.responses.md - retention-days: 14 diff --git a/AGENTS.md b/AGENTS.md index fdc7602f..42f36d4c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -58,3 +58,4 @@ import aidd-custom/AGENTS.md // settings from this import should override the ro ## Task Index fix bug => /aidd-fix +review pull request => /aidd-pr diff --git a/README.md b/README.md index 78703dca..21f39cd2 100644 --- a/README.md +++ b/README.md @@ -465,7 +465,7 @@ Skills are reusable agent workflows that extend AIDD with specialized capabiliti - **[/aidd-ecs](ai/skills/aidd-ecs/README.md)** — Enforces @adobe/data/ecs best practices. Use when working with ECS components, resources, transactions, actions, systems, or services. - **[/aidd-error-causes](ai/skills/aidd-error-causes/README.md)** — Structured error handling with the error-causes library. Use when throwing errors, catching errors, defining error types, or implementing error routing. - **[/aidd-fix](ai/skills/aidd-fix/README.md)** — Fix a bug or implement review feedback following the AIDD fix process. Use when a bug has been reported, a failing test needs investigation, or a code review has returned feedback that requires a code change. -- **[/aidd-functional-requirements](ai/skills/aidd-functional-requirements/README.md)** — Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. +- **[/aidd-requirements](ai/skills/aidd-requirements/README.md)** — Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. - **[/aidd-javascript](ai/skills/aidd-javascript/README.md)** — JavaScript and TypeScript best practices and guidance. Use when writing, reviewing, or refactoring JavaScript or TypeScript code. - **[/aidd-javascript-io-effects](ai/skills/aidd-javascript-io-effects/README.md)** — Isolate network I/O and side effects using the saga pattern with call and put. Use when making network requests, invoking side effects, or implementing Redux sagas. - **[/aidd-jwt-security](ai/skills/aidd-jwt-security/README.md)** — JWT security review patterns. Use when reviewing or implementing authentication code, token handling, or session management. diff --git a/ai-evals/aidd-parallel/fixtures/add.js b/ai-evals/aidd-parallel/fixtures/add.js new file mode 100644 index 00000000..dc1ead94 --- /dev/null +++ b/ai-evals/aidd-parallel/fixtures/add.js @@ -0,0 +1 @@ +export const add = (a, b) => a - b; diff --git a/ai-evals/aidd-parallel/fixtures/greet.js b/ai-evals/aidd-parallel/fixtures/greet.js new file mode 100644 index 00000000..1ad927dd --- /dev/null +++ b/ai-evals/aidd-parallel/fixtures/greet.js @@ -0,0 +1 @@ +export const greet = (name) => `Hello, ${name}`; diff --git a/ai-evals/aidd-parallel/prompt-generation-test.sudo b/ai-evals/aidd-parallel/prompt-generation-test.sudo new file mode 100644 index 00000000..406a36d1 --- /dev/null +++ b/ai-evals/aidd-parallel/prompt-generation-test.sudo @@ -0,0 +1,21 @@ +import 'ai/skills/aidd-parallel/SKILL.md' + +userPrompt = """ +Run /aidd-parallel --branch feature/utils with the following two tasks: + +Task 1: + File: ai-evals/aidd-parallel/fixtures/add.js, line 1 + "add() subtracts instead of adding — should use + not -" + +Task 2: + File: ai-evals/aidd-parallel/fixtures/greet.js, line 1 + "greet() should include an exclamation mark at the end of the greeting" + +Generate delegation prompts for both tasks. +""" + +- Given two tasks and a branch, should generate a separate delegation prompt for each task +- Given a generated prompt, should start with /aidd-fix +- Given a generated prompt, should reference the correct branch feature/utils +- Given a generated prompt, should instruct the sub-agent to commit and push to origin/feature/utils +- Given a generated prompt, should be wrapped in a markdown codeblock diff --git a/ai-evals/aidd-pr/fixtures/add.js b/ai-evals/aidd-pr/fixtures/add.js new file mode 100644 index 00000000..dc1ead94 --- /dev/null +++ b/ai-evals/aidd-pr/fixtures/add.js @@ -0,0 +1 @@ +export const add = (a, b) => a - b; diff --git a/ai-evals/aidd-pr/fixtures/greet.js b/ai-evals/aidd-pr/fixtures/greet.js new file mode 100644 index 00000000..f7a23827 --- /dev/null +++ b/ai-evals/aidd-pr/fixtures/greet.js @@ -0,0 +1 @@ +export const greet = (name) => `Hello, ${name}!`; diff --git a/ai-evals/aidd-pr/step-1-triage-test.sudo b/ai-evals/aidd-pr/step-1-triage-test.sudo new file mode 100644 index 00000000..81700548 --- /dev/null +++ b/ai-evals/aidd-pr/step-1-triage-test.sudo @@ -0,0 +1,25 @@ +import 'ai/skills/aidd-pr/SKILL.md' + +userPrompt = """ +You have the following mock tools available. Use them instead of real gh or GraphQL calls: + +mock gh pr view => returns: + title: Fix utility functions + branch: feature/utils + base: main + +mock gh api (list review threads) => returns: + [ + { id: "T_01", resolved: false, file: "ai-evals/aidd-pr/fixtures/add.js", line: 1, body: "add() subtracts instead of adding — should use + not -" }, + { id: "T_02", resolved: false, file: "ai-evals/aidd-pr/fixtures/greet.js", line: 1, body: "greet() should include an exclamation mark at the end of the greeting" } + ] + +mock GraphQL resolveReviewThread => returns: { thread: { isResolved: true } } + +Run step 1 of /aidd-pr: triage the review threads. +""" + +- Given the mock gh api returns two threads, should list both threads before taking any action +- Given a thread whose concern is already fixed in the current source, should classify it as addressed +- Given a thread whose reported issue is still present in the current source, should classify it as remaining +- Given the addressed list is presented, should require approval before resolving diff --git a/ai-evals/aidd-pr/step-2-delegation-test.sudo b/ai-evals/aidd-pr/step-2-delegation-test.sudo new file mode 100644 index 00000000..dc0c27b0 --- /dev/null +++ b/ai-evals/aidd-pr/step-2-delegation-test.sudo @@ -0,0 +1,26 @@ +import 'ai/skills/aidd-pr/SKILL.md' + +userPrompt = """ +You have the following mock tools available. Use them instead of real gh or GraphQL calls: + +mock gh pr view => returns: + title: Fix utility functions + branch: feature/utils + base: main + +mock GraphQL resolveReviewThread => returns: { thread: { isResolved: true } } + +Triage is complete. The following issues remain unresolved: + +Issue 1 (thread ID: T_01): + File: ai-evals/aidd-pr/fixtures/add.js, line 1 + "add() subtracts instead of adding — should use + not -" + +Generate delegation prompts for the remaining issues. +""" + +- Given one remaining issue, should generate a delegation prompt for it +- Given a delegation prompt, should start with /aidd-fix +- Given a delegation prompt, should reference the specific file from the review comment +- Given a delegation prompt, should instruct the agent to commit directly to the PR branch feature/utils and not create a new branch +- Given a delegation prompt, should be wrapped in a markdown codeblock diff --git a/ai/commands/aidd-parallel.md b/ai/commands/aidd-parallel.md new file mode 100644 index 00000000..fca204be --- /dev/null +++ b/ai/commands/aidd-parallel.md @@ -0,0 +1,10 @@ +--- +description: Generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch them to sub-agents in dependency order +--- +# 🔀 /aidd-parallel + +Load and execute the skill at `ai/skills/aidd-parallel/SKILL.md`. + +Constraints { + Before beginning, read and respect the constraints in /aidd-please. +} diff --git a/ai/commands/aidd-pr.md b/ai/commands/aidd-pr.md new file mode 100644 index 00000000..8c9e34c6 --- /dev/null +++ b/ai/commands/aidd-pr.md @@ -0,0 +1,10 @@ +--- +description: Review a PR, resolve addressed comments, and generate /aidd-fix delegation prompts for remaining issues +--- +# 🔍 /aidd-pr + +Load and execute the skill at `ai/skills/aidd-pr/SKILL.md`. + +Constraints { + Before beginning, read and respect the constraints in /aidd-please. +} diff --git a/ai/commands/aidd-requirements.md b/ai/commands/aidd-requirements.md new file mode 100644 index 00000000..227dfc72 --- /dev/null +++ b/ai/commands/aidd-requirements.md @@ -0,0 +1,10 @@ +--- +description: Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. +--- +# 📋 /aidd-requirements + +Load and execute the skill at `ai/skills/aidd-requirements/SKILL.md`. + +Constraints { + Before beginning, read and respect the constraints in /aidd-please. +} diff --git a/ai/commands/aidd-riteway-ai.md b/ai/commands/aidd-riteway-ai.md new file mode 100644 index 00000000..4f0fd66f --- /dev/null +++ b/ai/commands/aidd-riteway-ai.md @@ -0,0 +1,10 @@ +--- +description: Write correct riteway ai prompt evals for multi-step tool-calling flows. Use when creating .sudo eval files or testing agent skills that use tools. +--- +# 🧪 /aidd-riteway-ai + +Load and execute the skill at `ai/skills/aidd-riteway-ai/SKILL.md`. + +Constraints { + Before beginning, read and respect the constraints in /aidd-please. +} diff --git a/ai/commands/index.md b/ai/commands/index.md index 1c3bf404..e6e04e8a 100644 --- a/ai/commands/index.md +++ b/ai/commands/index.md @@ -16,6 +16,30 @@ Rank files by hotspot score to identify prime candidates for refactoring before *No description available* +### 🔀 /aidd-parallel + +**File:** `aidd-parallel.md` + +Generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch them to sub-agents in dependency order + +### 🔍 /aidd-pr + +**File:** `aidd-pr.md` + +Review a PR, resolve addressed comments, and generate /aidd-fix delegation prompts for remaining issues + +### 📋 /aidd-requirements + +**File:** `aidd-requirements.md` + +Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. + +### 🧪 /aidd-riteway-ai + +**File:** `aidd-riteway-ai.md` + +Write correct riteway ai prompt evals for multi-step tool-calling flows. Use when creating .sudo eval files or testing agent skills that use tools. + ### Commit **File:** `commit.md` diff --git a/ai/skills/aidd-parallel/README.md b/ai/skills/aidd-parallel/README.md new file mode 100644 index 00000000..d185a39d --- /dev/null +++ b/ai/skills/aidd-parallel/README.md @@ -0,0 +1,64 @@ +# aidd-parallel — Parallel Sub-Agent Delegation + +`/aidd-parallel` generates focused `/aidd-fix` delegation prompts for a list +of tasks and can dispatch them to sub-agents in dependency order. + +## Usage + +``` +/aidd-parallel [--branch ] — generate one /aidd-fix delegation prompt per task +/aidd-parallel delegate — build file list + dep graph, sequence, and dispatch +``` + +## Why parallel delegation matters + +When a PR review or task breakdown produces multiple independent issues, fixing +them sequentially in a single agent thread wastes time and dilutes attention. +`/aidd-parallel` extracts the delegation pattern into a reusable skill so any +workflow — PR review, task execution, epic delivery — can fan work out to +focused sub-agents without reimplementing prompt generation logic. + +## How it works + +### Step 1 — Resolve the branch + +If `--branch ` is supplied, use that branch. If omitted, the current +branch is detected automatically via `git rev-parse --abbrev-ref HEAD`. + +### Step 2 — Generate delegation prompts + +For each task, one `/aidd-fix` delegation prompt is produced. Every prompt: + +- Starts with `/aidd-fix` +- Contains only the context needed for that single task +- Instructs the sub-agent to work directly on the target branch and commit and + push to `origin/` — never to `main`, never to a new branch +- Is wrapped in a fenced markdown codeblock; any nested codeblocks are indented + one level to prevent them from breaking the outer fence + +### Step 3 (delegate only) — Build the dependency graph + +`/aidd-parallel delegate` first builds a list of files each task will change, +then produces a Mermaid change dependency graph. The graph is used for +sequencing only — it is not saved or committed. + +### Step 4 (delegate only) — Dispatch in dependency order + +Prompts are dispatched to sub-agent workers in the order determined by the +dependency graph: tasks with no dependencies first, dependents after their +prerequisites are complete. + +Post-dispatch callbacks (e.g. resolving PR conversation threads) are the +caller's responsibility. + +## When to use `/aidd-parallel` + +- A PR review has multiple independent issues that should be fixed in parallel +- A task epic has been broken into independent sub-tasks suitable for parallel execution +- Any workflow that needs to fan work out to multiple `/aidd-fix` sub-agents + +## Constraints + +- Each prompt must be wrapped in a markdown codeblock +- Nested codeblocks inside a prompt must be indented to prevent breaking the outer fence +- Sub-agents are always directed to the supplied branch — never to `main` or a new branch diff --git a/ai/skills/aidd-parallel/SKILL.md b/ai/skills/aidd-parallel/SKILL.md new file mode 100644 index 00000000..68298ed6 --- /dev/null +++ b/ai/skills/aidd-parallel/SKILL.md @@ -0,0 +1,60 @@ +--- +name: aidd-parallel +description: > + Generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch + them to sub-agents in dependency order. + Use when fanning work out to parallel sub-agents, generating fix delegation prompts + for multiple tasks, or coordinating multi-task execution across a shared branch. +compatibility: Requires git available in the project. +--- + +# 🔀 aidd-parallel + +Act as a top-tier software engineering lead to generate focused `/aidd-fix` +delegation prompts and coordinate parallel sub-agent execution. + +Competencies { + parallel task decomposition + dependency graph analysis + sub-agent delegation via /aidd-fix + branch-targeted prompt generation +} + +Constraints { + Put each delegation prompt in a markdown codeblock, indenting any nested codeblocks to prevent breaking the outer block + Instruct each sub-agent to work directly on the supplied branch and commit and push to origin on that branch (not to main, not to their own branch) + If --branch is omitted, use the current branch (git rev-parse --abbrev-ref HEAD) +} + +## Command: /aidd-parallel [--branch ] + +generateDelegationPrompts(tasks, branch) => prompts { + 1. Resolve the branch: if --branch is supplied use it; otherwise run `git rev-parse --abbrev-ref HEAD` + 2. For each task, generate a focused `/aidd-fix` delegation prompt: + - Start the prompt with `/aidd-fix` + - Include only the context needed to address that single task + - Instruct the sub-agent to work directly on ``, commit, and push to `origin/` + - Do NOT instruct the sub-agent to create a new branch + 3. Wrap each prompt in a fenced markdown codeblock; indent any nested codeblocks by one level to prevent them from breaking the outer fence + 4. Output one codeblock per task +} + +## Command: /aidd-parallel delegate + +delegate(tasks, branch) { + 1. Call generateDelegationPrompts to produce one prompt per task + 2. Build a list of files that each task will need to change + 3. Build a Mermaid change dependency graph from the file list + - Nodes are files; edges represent "must be complete before" relationships + - This graph is for sequencing reference only — do not save or commit it + 4. Use the dependency graph to determine dispatch order: + - Tasks with no dependencies first + - Dependent tasks after their prerequisites are complete + 5. Spawn one sub-agent worker per prompt in dependency order + 6. Post-dispatch callbacks (e.g. resolving PR threads) are the caller's responsibility +} + +Commands { + /aidd-parallel [--branch ] - generate one /aidd-fix delegation prompt per task + /aidd-parallel delegate [--branch ] - build file list + mermaid dep graph, sequence, and dispatch to sub-agents +} diff --git a/ai/skills/aidd-please/SKILL.md b/ai/skills/aidd-please/SKILL.md index 7916350d..c3925b53 100644 --- a/ai/skills/aidd-please/SKILL.md +++ b/ai/skills/aidd-please/SKILL.md @@ -46,6 +46,9 @@ Commands { 🧪 /user-test - use /aidd-user-testing to generate human and AI agent test scripts from user journeys 🤖 /run-test - execute AI agent test script in real browser with screenshots 🐛 /aidd-fix - fix a bug or implement review feedback following the full AIDD fix process + 🔍 /aidd-pr - review a pull request, triage comments, and generate /aidd-fix delegation prompts for remaining issues + 🔀 /aidd-parallel - generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch them to sub-agents in dependency order + 🧪 /aidd-riteway-ai - write correct riteway ai prompt evals for multi-step tool-calling flows } Constraints { diff --git a/ai/skills/aidd-pr/README.md b/ai/skills/aidd-pr/README.md new file mode 100644 index 00000000..2c9fbb2a --- /dev/null +++ b/ai/skills/aidd-pr/README.md @@ -0,0 +1,17 @@ +# aidd-pr + +`/aidd-pr` triages pull request review comments, resolves already-addressed threads, and delegates targeted fix prompts to sub-agents via `/aidd-fix`. + +## Usage + +``` +/aidd-pr [PR URL] — triage comments, resolve addressed threads, and generate /aidd-fix delegation prompts +/aidd-pr delegate — dispatch the generated prompts to sub-agents and resolve related PR conversations via the GitHub GraphQL API +``` + +## How it works + +1. Uses `gh` to list all open review threads and identify which have already been addressed in code +2. Presents the addressed list for manual approval, then resolves those threads via the GitHub GraphQL API +3. Validates remaining issues against the current source +4. For each confirmed issue, generates a focused `/aidd-fix` delegation prompt — one issue per prompt, targeting the PR branch directly diff --git a/ai/skills/aidd-pr/SKILL.md b/ai/skills/aidd-pr/SKILL.md new file mode 100644 index 00000000..e883f2d4 --- /dev/null +++ b/ai/skills/aidd-pr/SKILL.md @@ -0,0 +1,40 @@ +--- +name: aidd-pr +description: > + Triage PR review comments, resolve already-addressed threads, and delegate /aidd-fix prompts for remaining issues. + Use when a PR has open review comments that need to be triaged, resolved, or delegated to sub-agents. +compatibility: Requires gh CLI authenticated and git available in the project. +--- + +# 🔍 aidd-pr + +Act as a top-tier software engineering lead to triage pull request review comments, +resolve already-addressed issues, and coordinate targeted fixes using the AIDD fix process. + +Competencies { + pull request triage + review comment analysis + fix delegation via /aidd-fix + GitHub GraphQL API for resolving conversations +} + +Constraints { + Always delegate fixes to sub-agents to avoid attention dilution when sub-agents are available +} + +Given the following PR: + +1. Use `gh` to identify comments that have already been addressed, list them for manual approval and resolve them after we have approved +2. Validate remaining issues, and: + +For each issue, use `/aidd-parallel --branch ` to generate the delegation prompts. + +Constraints { + Do not close any other PRs + Do not touch any git branches other than the PR's branch as determined via `gh pr view` +} + +Commands { + /aidd-pr [PR URL] - take a PR URL, identify issues, and delegate prompts to fix the issues + /aidd-pr delegate - call /aidd-parallel delegate to dispatch prompts, then resolve related PR conversation threads via the GitHub GraphQL API +} diff --git a/ai/skills/aidd-functional-requirements/README.md b/ai/skills/aidd-requirements/README.md similarity index 86% rename from ai/skills/aidd-functional-requirements/README.md rename to ai/skills/aidd-requirements/README.md index 70681818..2ff50f1d 100644 --- a/ai/skills/aidd-functional-requirements/README.md +++ b/ai/skills/aidd-requirements/README.md @@ -1,4 +1,4 @@ -# aidd-functional-requirements +# aidd-requirements Writes functional requirements for user stories using a standardized "Given X, should Y" format focused on user outcomes. @@ -11,7 +11,7 @@ requirements testable and unambiguous. ## Usage -Invoke `/aidd-functional-requirements` with the user story. Each requirement +Invoke `/aidd-requirements` with the user story. Each requirement follows this template: ``` diff --git a/ai/skills/aidd-functional-requirements/SKILL.md b/ai/skills/aidd-requirements/SKILL.md similarity index 94% rename from ai/skills/aidd-functional-requirements/SKILL.md rename to ai/skills/aidd-requirements/SKILL.md index 685d908f..7997ff8b 100644 --- a/ai/skills/aidd-functional-requirements/SKILL.md +++ b/ai/skills/aidd-requirements/SKILL.md @@ -1,5 +1,5 @@ --- -name: aidd-functional-requirements +name: aidd-requirements description: Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. --- diff --git a/ai/skills/aidd-riteway-ai/README.md b/ai/skills/aidd-riteway-ai/README.md new file mode 100644 index 00000000..b8895233 --- /dev/null +++ b/ai/skills/aidd-riteway-ai/README.md @@ -0,0 +1,21 @@ +# aidd-riteway-ai + +`/aidd-riteway-ai` teaches agents how to write correct `riteway ai` prompt evals (`.sudo` files) for multi-step agent flows that involve tool calls. + +## Usage + +``` +/aidd-riteway-ai — write riteway ai prompt evals for a multi-step tool-calling skill +``` + +## How it works + +1. Splits the eval into one `.sudo` file per step, named `step-N--test.sudo` — never collapses multiple steps into a single file +2. Adds a mock-tool preamble to unit evals so the agent uses stub return values instead of calling real APIs +3. For step 1, asserts that the agent makes the correct tool calls — never pre-supplies the answers those calls would return +4. For steps N > 1, includes the previous step's output as context so each file runs independently without replaying earlier steps live +5. Names e2e evals `-e2e.test.sudo` and omits the mock preamble so they run against live APIs with real credentials +6. Keeps fixture files under 20 lines with exactly one bug or condition per file to keep assertion outcomes unambiguous +7. Derives all assertions strictly from functional requirements using the `Given X, should Y` format, testing only distinct observable behaviors with no duplicates + +See [SKILL.md](./SKILL.md) for the full rule set and the eval authoring checklist. diff --git a/ai/skills/aidd-riteway-ai/SKILL.md b/ai/skills/aidd-riteway-ai/SKILL.md new file mode 100644 index 00000000..accd14a6 --- /dev/null +++ b/ai/skills/aidd-riteway-ai/SKILL.md @@ -0,0 +1,211 @@ +--- +name: aidd-riteway-ai +description: > + Teaches agents how to write correct riteway ai prompt evals (.sudo files) for + multi-step flows that involve tool calls. + Use when writing prompt evals, creating .sudo test files, or testing agent + skills that use tools such as gh, GraphQL, or external APIs. +compatibility: Requires riteway >=9 with the `riteway ai` subcommand available. +--- + +# 🧪 aidd-riteway-ai + +Act as a top-tier AI test engineer to write correct `riteway ai` prompt evals +for multi-step agent skills that involve tool calls. + +Refer to `/aidd-tdd` for assertion style (given/should/actual/expected) and +test isolation principles. + +Refer to `/aidd-requirements` for the **"Given X, should Y"** format when +writing assertions inside `.sudo` eval files. + +--- + +## Eval File Structure + +A `.sudo` eval file has three sections: + +``` +import 'ai/skills//SKILL.md' + +userPrompt = """ + +""" + +- Given , should +- Given , should +``` + +Assertions are bullet points written after the `userPrompt` block. +Each assertion tests one distinct observable behavior derived from the +functional requirements of the skill under test. + +--- + +## Rule 1 — One eval file per step + +Given a multi-step flow under test, write **one `.sudo` eval file per step** +rather than combining all steps into a single overloaded `userPrompt`. + +Naming convention: + +``` +ai-evals//step-1--test.sudo +ai-evals//step-2--test.sudo +``` + +Do not collapse multiple steps into one file. Each file tests exactly one +discrete agent action. + +--- + +## Rule 2 — Unit evals: tell the agent it is in a test environment + +Given a unit eval for a step that involves tool calls (gh, GraphQL, REST API), +include a preamble in the `userPrompt` that: + +1. Tells the prompted agent it is operating in a test environment. +2. Provides mock tools with stub return values. +3. Instructs the agent to use the mock tools instead of calling real APIs. + +Example preamble: + +``` +You have the following mock tools available. Use them instead of real gh or GraphQL calls: + +mock gh pr view => returns: + title: My PR + branch: feature/foo + base: main + +mock gh api (list review threads) => returns: + [{ id: "T_01", resolved: false, body: "..." }] +``` + +--- + +## Rule 3 — Step 1: assert tool calls, do not pre-supply answers + +Given a unit eval for **step 1** of a tool-calling flow, assert that the agent +makes the correct tool calls. Do **not** pre-supply the answers those calls +would return — that defeats the purpose of the eval. + +Correct pattern for step 1: + +``` +userPrompt = """ +You have mock tools available. Use them instead of real API calls. +Run step 1 of /aidd-pr: fetch the PR details and review threads. +""" + +- Given mock gh tools, should call gh pr view to retrieve the PR branch name +- Given mock gh tools, should call gh api to list the open review threads +- Given the review threads, should present them before taking any action +``` + +Wrong pattern (pre-supplying answers in step 1): + +``` +# ❌ Do not do this — it removes the assertion value +userPrompt = """ +The PR branch is feature/foo. +The review threads are: [...] +Now generate delegation prompts. +""" +``` + +--- + +## Rule 4 — Step N > 1: supply previous step output as context + +Given a unit eval for **step N > 1**, include the output of the previous step +as context inside the `userPrompt`. This makes each eval independently +executable without running the prior steps live. + +Example for step 2: + +``` +userPrompt = """ +You have mock tools available. Use them instead of real calls. + +Triage is complete. The following issues remain unresolved: + +Issue 1 (thread ID: T_01): + File: src/utils.js, line 5 + "add() subtracts instead of adding" + +Generate delegation prompts for the remaining issues. +""" +``` + +--- + +## Rule 5 — E2e evals: use real tools, follow -e2e.test.sudo naming + +Given an e2e eval, use real tools (no mock preamble) and follow the +`-e2e.test.sudo` naming convention to mirror the project's existing unit/e2e +split: + +``` +ai-evals//step-1--e2e.test.sudo +``` + +E2e evals run against live APIs. Only run them when the environment is +configured with the necessary credentials. + +--- + +## Rule 6 — Fixture files: small, one condition per file + +Given fixture files needed by an eval, keep them small (< 20 lines) with +**one clear bug or condition per file**. Fixtures live in: + +``` +ai-evals//fixtures/ +``` + +Example fixture (`add.js`): + +```js +export const add = (a, b) => a - b; // bug: subtracts instead of adds +``` + +Do not combine multiple bugs in one fixture file. Each fixture must make the +assertion conditions unambiguous. + +--- + +## Rule 7 — Assertions: derived from functional requirements only + +Given assertions in a `.sudo` eval, derive them strictly from the functional +requirements of the skill under test using the `/aidd-requirements` format: + +``` +- Given , should +``` + +Include only assertions that test **distinct observable behaviors**. Do not: + +- Assert implementation details (e.g. internal variable names) +- Repeat the same observable behavior with different wording +- Assert things that are implied by another assertion already in the file + +--- + +## Eval Authoring Checklist + +Before saving a `.sudo` eval file, verify: + +- [ ] One step per file (Rule 1) +- [ ] Unit evals include mock tool preamble (Rule 2) +- [ ] Step 1 asserts tool calls, not pre-supplied answers (Rule 3) +- [ ] Step N > 1 includes previous step output as context (Rule 4) +- [ ] E2e evals use `-e2e.test.sudo` suffix (Rule 5) +- [ ] Fixture files are small, one condition each (Rule 6) +- [ ] Assertions derived from functional requirements, no duplicates (Rule 7) + +--- + +Commands { + 🧪 /aidd-riteway-ai - write correct riteway ai prompt evals for multi-step tool-calling flows +} diff --git a/ai/skills/aidd-riteway-ai/riteway-ai.test.js b/ai/skills/aidd-riteway-ai/riteway-ai.test.js new file mode 100644 index 00000000..803673c1 --- /dev/null +++ b/ai/skills/aidd-riteway-ai/riteway-ai.test.js @@ -0,0 +1,203 @@ +import path from "path"; +import { fileURLToPath } from "url"; +import fs from "fs-extra"; +import { assert } from "riteway/vitest"; +import { describe, test } from "vitest"; + +import { parseFrontmatter } from "../../../lib/index-generator.js"; + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); + +describe("aidd-riteway-ai", () => { + describe("SKILL.md", () => { + test("file exists with valid frontmatter", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const exists = await fs.pathExists(filePath); + + assert({ + given: "aidd-riteway-ai SKILL.md file", + should: "exist in ai/skills directory", + actual: exists, + expected: true, + }); + + const content = await fs.readFile(filePath, "utf-8"); + const frontmatter = parseFrontmatter(content); + + assert({ + given: "aidd-riteway-ai frontmatter", + should: "have name field matching directory", + actual: frontmatter?.name, + expected: "aidd-riteway-ai", + }); + + assert({ + given: "aidd-riteway-ai frontmatter", + should: "have description field", + actual: typeof frontmatter?.description, + expected: "string", + }); + + assert({ + given: "aidd-riteway-ai frontmatter description", + should: "include a Use when clause", + actual: frontmatter?.description?.includes("Use when"), + expected: true, + }); + }); + + test("references /aidd-tdd and /aidd-requirements", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content", + should: "reference /aidd-tdd", + actual: content.includes("/aidd-tdd"), + expected: true, + }); + + assert({ + given: "aidd-riteway-ai SKILL.md content", + should: "reference /aidd-requirements", + actual: content.includes("/aidd-requirements"), + expected: true, + }); + }); + + test("encodes one eval file per step rule", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content", + should: "instruct one .sudo eval file per step", + actual: content.includes(".sudo") && content.includes("per step"), + expected: true, + }); + }); + + test("encodes mock tools rule for unit evals", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content", + should: "instruct agent to use mock tools in unit evals", + actual: content.includes("mock"), + expected: true, + }); + }); + + test("encodes assert tool calls rule for step 1", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content for step 1", + should: + "instruct to assert correct tool calls rather than pre-supply answers", + actual: content.includes("step 1") || content.includes("Step 1"), + expected: true, + }); + }); + + test("encodes previous step output rule for step N", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content for step N > 1", + should: "instruct to supply previous step output as context", + actual: + content.includes("previous step") || content.includes("prior step"), + expected: true, + }); + }); + + test("encodes e2e eval naming convention", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content for e2e evals", + should: "specify the -e2e.test.sudo naming convention", + actual: content.includes("-e2e.test.sudo"), + expected: true, + }); + }); + + test("encodes fixture file guidance", async () => { + const filePath = path.join(__dirname, "./SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai SKILL.md content about fixtures", + should: "instruct fixtures to be small with one clear bug or condition", + actual: content.includes("fixture") || content.includes("Fixture"), + expected: true, + }); + }); + }); + + describe("aidd-riteway-ai command", () => { + test("command file exists", async () => { + const filePath = path.join( + __dirname, + "../../commands/aidd-riteway-ai.md", + ); + const exists = await fs.pathExists(filePath); + + assert({ + given: "aidd-riteway-ai.md command file", + should: "exist in ai/commands directory", + actual: exists, + expected: true, + }); + }); + + test("command file references the skill", async () => { + const filePath = path.join( + __dirname, + "../../commands/aidd-riteway-ai.md", + ); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai.md command content", + should: "load and execute aidd-riteway-ai SKILL.md", + actual: content.includes("aidd-riteway-ai/SKILL.md"), + expected: true, + }); + }); + + test("command respects aidd-please constraints", async () => { + const filePath = path.join( + __dirname, + "../../commands/aidd-riteway-ai.md", + ); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-riteway-ai.md command content", + should: "reference /aidd-please constraints", + actual: content.includes("/aidd-please"), + expected: true, + }); + }); + }); + + describe("aidd-please integration", () => { + test("aidd-please Commands block lists /aidd-riteway-ai", async () => { + const filePath = path.join(__dirname, "../aidd-please/SKILL.md"); + const content = await fs.readFile(filePath, "utf-8"); + + assert({ + given: "aidd-please SKILL.md Commands block", + should: "list /aidd-riteway-ai for agent discovery", + actual: content.includes("/aidd-riteway-ai"), + expected: true, + }); + }); + }); +}); diff --git a/ai/skills/aidd-task-creator/SKILL.md b/ai/skills/aidd-task-creator/SKILL.md index b0215575..51c488a1 100644 --- a/ai/skills/aidd-task-creator/SKILL.md +++ b/ai/skills/aidd-task-creator/SKILL.md @@ -30,7 +30,7 @@ State { ## Requirements Analysis -Use /aidd-functional-requirements to analyze and generate the requirements of the task. +Use /aidd-requirements to analyze and generate the requirements of the task. ## Agent Orchestration diff --git a/ai/skills/index.md b/ai/skills/index.md index b6604198..ef610b1d 100644 --- a/ai/skills/index.md +++ b/ai/skills/index.md @@ -6,7 +6,6 @@ - aidd-ecs - Enforces @adobe/data/ecs best practices. Use this whenever @adobe/data/ecs is imported, when creating or modifying Database.Plugin definitions, or when working with ECS components, resources, transactions, actions, systems, or services. - aidd-error-causes - Use the error-causes library for structured error handling in JavaScript/TypeScript. Use when throwing errors, catching errors, defining error types, or implementing error routing. - aidd-fix - Fix a bug or implement review feedback following the AIDD fix process. Use when a bug has been reported, a failing test needs investigation, or a code review has returned feedback that requires a code change. -- aidd-functional-requirements - Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. - aidd-javascript - JavaScript and TypeScript best practices and guidance. Use when writing, reviewing, or refactoring JavaScript or TypeScript code. - aidd-javascript-io-effects - Isolate network I/O and side effects using the saga pattern with call and put. Use when making network requests, invoking side effects, or implementing Redux sagas. - aidd-jwt-security - JWT security review patterns. Use when reviewing or implementing authentication code, token handling, session management, or when JWT is mentioned. @@ -15,10 +14,14 @@ - aidd-log - Document completed epics in a structured changelog with emoji categorization. Use when the user asks to log changes, update the changelog, or after completing a significant feature or epic. - aidd-namespace - Ensures types and related functions are authored and consumed in a modular, discoverable, tree-shakeable pattern. Use when creating types, refactoring type folders, defining schemas, importing types, or when the user mentions type namespaces, constants, or Schema.ToType. - aidd-observe - Enforces Observe pattern best practices from @adobe/data/observe. Use when working with Observe, observables, reactive data flow, service Observe properties, or when the user asks about Observe.withMap, Observe.withFilter, Observe.fromConstant, Observe.fromProperties, or similar. +- aidd-parallel - Generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch them to sub-agents in dependency order. Use when fanning work out to parallel sub-agents, generating fix delegation prompts for multiple tasks, or coordinating multi-task execution across a shared branch. - aidd-please - General AI assistant for software development projects. Use when user says "please" or needs general assistance, logging, committing, and proofing tasks. +- aidd-pr - Triage PR review comments, resolve already-addressed threads, and delegate /aidd-fix prompts for remaining issues. Use when a PR has open review comments that need to be triaged, resolved, or delegated to sub-agents. - aidd-product-manager - Plan features, user stories, user journeys, and conduct product discovery. Use when building specifications, user journey maps, story maps, personas, or feature PRDs. - aidd-react - Enforces React component authoring best practices. Use when creating React components, binding components, presentations, useObservableValues, or when the user asks about React UI patterns, reactive binding, or action callbacks. +- aidd-requirements - Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. - aidd-review - Conduct a thorough code review focusing on code quality, best practices, security, test coverage, and adherence to project standards and functional requirements. Use when reviewing code, pull requests, or completed epics. +- aidd-riteway-ai - Teaches agents how to write correct riteway ai prompt evals (.sudo files) for multi-step flows that involve tool calls. Use when writing prompt evals, creating .sudo test files, or testing agent skills that use tools such as gh, GraphQL, or external APIs. - aidd-service - Enforces asynchronous data service authoring best practices. Use when creating front-end or back-end services, service interfaces, Observe patterns, AsyncDataService, or when the user asks about service layer, data flow, unidirectional UI, or action/observable design. - aidd-stack - Tech stack guidance for NextJS + React/Redux + Shadcn UI features. Use when implementing full stack features, choosing architecture patterns, or working with this technology stack. - aidd-structure - Enforces source code structuring and interdependency best practices. Use when creating folders, moving files, adding imports, or when the user asks about architecture, layering, or module dependencies. diff --git a/docs/learn-aidd-framework.md b/docs/learn-aidd-framework.md index 4b8375f8..d5adda5a 100644 --- a/docs/learn-aidd-framework.md +++ b/docs/learn-aidd-framework.md @@ -6,7 +6,7 @@ There are a few different areas you need to learn to get good at aidd Framework: - [Product Management](../ai/skills/aidd-product-manager/SKILL.md) - [User Story Mapping](../ai/skills/aidd-product-manager/SKILL.md) -- [Functional Requirements](../ai/skills/aidd-functional-requirements/SKILL.md) +- [Functional Requirements](../ai/skills/aidd-requirements/SKILL.md) - [Test Driven Development](../ai/skills/aidd-tdd/SKILL.md) - [User Testing](user-testing.md) - Deployment & CI/CD diff --git a/lib/agents-md.js b/lib/agents-md.js index cac9a560..f2f390f9 100644 --- a/lib/agents-md.js +++ b/lib/agents-md.js @@ -21,6 +21,7 @@ const requiredDirectives = [ "conflict", // Conflict resolution "generated", // Auto-generated files warning "import aidd-custom/AGENTS.md", // Import override from aidd-custom + "Task Index", // Task index for common agent operations ]; // The content for AGENTS.md @@ -68,6 +69,11 @@ If any conflicts are detected between a requested task and the vision document, Never proceed with a task that contradicts the vision without explicit user approval. +## Skills + +import @skills/index.md +import @aidd-custom/skills/index.md + ## Custom Skills and Configuration Project-specific customization lives in \`aidd-custom/\`. Before starting work, @@ -75,6 +81,11 @@ read \`aidd-custom/index.md\` to discover available project-specific skills, and read \`aidd-custom/config.yml\` to load configuration into context. import aidd-custom/AGENTS.md // settings from this import should override the root AGENTS.md settings + +## Task Index + +fix bug => /aidd-fix +review pull request => /aidd-pr `; /** @@ -188,6 +199,13 @@ and read \`aidd-custom/config.yml\` to load configuration into context.`, content: `import aidd-custom/AGENTS.md // settings from this import should override the root AGENTS.md settings`, keywords: ["import aidd-custom/AGENTS.md"], }, + { + content: `## Task Index + +fix bug => /aidd-fix +review pull request => /aidd-pr`, + keywords: ["Task Index"], + }, ]; /** diff --git a/lib/agents-md.test.js b/lib/agents-md.test.js index 8557bd91..c910b8e2 100644 --- a/lib/agents-md.test.js +++ b/lib/agents-md.test.js @@ -60,6 +60,7 @@ describe("agents-md", () => { CONFLICT resolution AIDD-CUSTOM folder customization IMPORT AIDD-CUSTOM/AGENTS.MD // override + TASK INDEX for common operations `; assert({ @@ -294,6 +295,11 @@ Always read the vision document first. Report conflict resolution to the user. Check aidd-custom/ for project-specific skills and configuration. import aidd-custom/AGENTS.md // settings from this import should override the root AGENTS.md settings + +## Task Index + +fix bug => /aidd-fix +review pull request => /aidd-pr `; await fs.writeFile(path.join(tempDir, "AGENTS.md"), customContent); @@ -372,6 +378,13 @@ import aidd-custom/AGENTS.md // settings from this import should override the ro actual: requiredDirectives.includes("import aidd-custom/AGENTS.md"), expected: true, }); + + assert({ + given: "agents need a quick-reference task index for common operations", + should: "include Task Index directive", + actual: requiredDirectives.includes("Task Index"), + expected: true, + }); }); }); @@ -395,6 +408,40 @@ import aidd-custom/AGENTS.md // settings from this import should override the ro ), expected: true, }); + + assert({ + given: "upgrading users missing the Task Index", + should: "have a Task Index section in directiveAppendSections", + actual: directiveAppendSections.some((s) => + s.keywords.includes("Task Index"), + ), + expected: true, + }); + }); + }); + + describe("agentsMdContent", () => { + test("includes the Task Index section", () => { + assert({ + given: "fresh install template", + should: "include the Task Index section", + actual: agentsMdContent.includes("## Task Index"), + expected: true, + }); + + assert({ + given: "fresh install template", + should: "include the fix bug task index entry", + actual: agentsMdContent.includes("fix bug => /aidd-fix"), + expected: true, + }); + + assert({ + given: "fresh install template", + should: "include the review pull request task index entry", + actual: agentsMdContent.includes("review pull request => /aidd-pr"), + expected: true, + }); }); }); diff --git a/lib/exports.test.js b/lib/exports.test.js index 4070101f..1748cfb1 100644 --- a/lib/exports.test.js +++ b/lib/exports.test.js @@ -1,4 +1,7 @@ // @ts-check + +import path from "path"; +import fs from "fs-extra"; import { assert } from "riteway/vitest"; import { describe, test } from "vitest"; @@ -225,3 +228,30 @@ describe("aidd/agent-config export", () => { }); }); }); + +describe("README.md skill links", () => { + test("links /aidd-requirements to the correct renamed path", async () => { + const readme = await fs.readFile( + path.join(process.cwd(), "README.md"), + "utf-8", + ); + + assert({ + given: "README.md skills table after aidd-functional-requirements rename", + should: + "not contain broken link to old aidd-functional-requirements path", + actual: readme.includes( + "ai/skills/aidd-functional-requirements/README.md", + ), + expected: false, + }); + + assert({ + given: "README.md skills table after aidd-functional-requirements rename", + should: + "link /aidd-requirements to ai/skills/aidd-requirements/README.md", + actual: readme.includes("ai/skills/aidd-requirements/README.md"), + expected: true, + }); + }); +}); diff --git a/package.json b/package.json index 69c628c0..1cb5aa68 100644 --- a/package.json +++ b/package.json @@ -105,7 +105,7 @@ "prepare": "husky", "release": "node release.js", "test": "vitest run && echo 'Test complete.' && npm run -s lint && npm run -s typecheck", - "test:ai-eval": "riteway ai ai-evals/aidd-review/review-skill-test.sudo --runs 4 --threshold 75 --timeout 600000 --agent claude --color --save-responses", + "test:ai-eval": "riteway ai 'ai-evals/**/*-test.sudo' --runs 4 --threshold 75 --timeout 600000 --agent claude --color --save-responses", "test:e2e": "vitest run **/*-e2e.test.js && echo 'E2E tests complete.'", "test:unit": "vitest run --exclude '**/*-e2e.test.js' && echo 'Unit tests complete.' && npm run -s lint && npm run -s typecheck", "toc": "doctoc README.md", diff --git a/tasks/ai-eval-ci-epic.md b/tasks/ai-eval-ci-epic.md new file mode 100644 index 00000000..0266e0e7 --- /dev/null +++ b/tasks/ai-eval-ci-epic.md @@ -0,0 +1,43 @@ +# AI Eval CI Epic + +**Status**: 📋 PLANNED +**Goal**: Wire all `.sudo` eval files into `test:ai-eval`, make the AI eval CI job non-blocking, and run it at most once daily and only when AI eval files have actually changed. + +## Overview + +The `test:ai-eval` script currently only runs `aidd-review/review-skill-test.sudo`. The `aidd-pr` and `aidd-parallel` evals are silently skipped. Additionally, the `ai-eval` CI job runs on every PR push and fails the build when the Claude quota is exhausted — a rate-limit issue unrelated to code correctness. AI evals are slow, expensive, and non-deterministic; they should not gate every commit. Instead they should run once daily on a schedule, and only when `.sudo` files have actually changed. + +--- + +## Wire all eval files into test:ai-eval + +Update `package.json` to discover and run all `.sudo` unit eval files under `ai-evals/` rather than hardcoding a single file. + +**Requirements**: +- Given multiple `.sudo` eval files exist under `ai-evals/`, should run all of them (excluding `-e2e.test.sudo` files) +- Given a new `.sudo` eval file is added to any `ai-evals/` subdirectory, should be picked up automatically without changing `package.json` +- Given an `-e2e.test.sudo` file exists, should not be run by `test:ai-eval` (only by `test:ai-eval:e2e`) +- Given `test:ai-eval` runs, should pass `--runs 4 --threshold 75 --timeout 600000 --agent claude --color --save-responses` consistent with the existing script + +--- + +## Make the AI eval CI job non-blocking + +The `ai-eval` job in `.github/workflows/test.yml` must not fail the PR build. + +**Requirements**: +- Given the `ai-eval` job fails for any reason (quota, auth, flaky result), should not block PR merges — add `continue-on-error: true` to the job +- Given the job is non-blocking, should still upload eval responses as artifacts so results are visible + +--- + +## Run AI evals on a daily schedule, only when eval files changed + +Replace the per-push `ai-eval` trigger with a scheduled daily run and a path-filtered per-push check. + +**Requirements**: +- Given a push or PR that does not modify any file under `ai-evals/**`, should skip the `ai-eval` job entirely +- Given a push or PR that modifies one or more files under `ai-evals/**`, should run the `ai-eval` job so authors get fast feedback when they change evals +- Given a daily schedule trigger (e.g. `cron: '0 8 * * *'`), should always run the full `ai-eval` suite regardless of changed files +- Given the daily scheduled run, should still be non-blocking (does not gate any merge) +- Given the schedule runs at 8am UTC, should run after the Claude quota resets at 7am UTC diff --git a/tasks/aidd-parallel-skill-epic.md b/tasks/aidd-parallel-skill-epic.md new file mode 100644 index 00000000..4bcc8b88 --- /dev/null +++ b/tasks/aidd-parallel-skill-epic.md @@ -0,0 +1,80 @@ +# aidd-parallel Skill Epic + +**Status**: 📋 PLANNED +**Goal**: Extract parallel prompt generation and sub-agent dispatch into a shared `/aidd-parallel` skill, fix the constraint conflation in `/aidd-pr`, and make prompt generation independently unit-testable. + +## Overview + +The prompt generation and sub-agent dispatch logic in `/aidd-pr` is reusable across any skill that needs to fan work out to sub-agents (PR review, task execution, etc). Extracting it into `/aidd-parallel` gives us a clean unit-testable boundary, fixes the constraint conflation problem in `/aidd-pr` (orchestrator constraints mixed with sub-agent constraints), and makes `/aidd-pr` simpler. + +--- + +## Create the aidd-parallel skill + +Add `ai/skills/aidd-parallel/SKILL.md` following the AgentSkills specification. + +**Requirements**: +- Given the agent needs to discover the skill, its name and description should be in the frontmatter +- Given the agent needs to discover what a skill does, the description should include a very brief description of functionality without delving into implementation details +- Given the agent needs to discover when to use a skill, the description should include a very brief "Use when..." clause +- Given a list of tasks, should generate one `/aidd-fix` delegation prompt per task +- Given a delegation prompt, should start with `/aidd-fix` +- Given a delegation prompt, should be wrapped in a markdown codeblock, with any nested codeblocks indented to prevent breaking the outer block +- Given `--branch ` is supplied, should instruct each sub-agent to work directly on `` and commit and push to origin on `` +- Given `--branch` is omitted, should assume the current branch +- Given `/aidd-parallel delegate`, should first create a list of files that will need to change and a mermaid change dependency graph (for sequencing reference only — do not save or commit) +- Given `/aidd-parallel delegate`, should use the dependency graph to sequence the prompts before dispatching +- Given `/aidd-parallel delegate`, should spawn one sub-agent worker per prompt in dependency order +- Given post-dispatch callbacks are needed (e.g. resolving PR threads), should be the caller's responsibility + +Constraints { + put the prompt in a markdown codeblock, indenting any nested codeblocks to prevent breaking the outer block + instruct the agent to work directly from the supplied branch and commit directly to the supplied branch (not from/to main, not to their own fix branch) +} + +Commands { + /aidd-parallel [--branch ] - generate one /aidd-fix delegation prompt per task + /aidd-parallel delegate - build file list + mermaid dep graph, sequence, and dispatch to sub-agents +} + +--- + +## Add the aidd-parallel command + +Add `ai/commands/aidd-parallel.md` so the skill is invokable and discoverable. + +**Requirements**: +- Given the command file, should load and execute `ai/skills/aidd-parallel/SKILL.md` +- Given the command file, should respect constraints from `/aidd-please` + +--- + +## Update aidd-pr to use aidd-parallel + +Remove the prompt generation and constraint logic from `/aidd-pr` that now belongs in `/aidd-parallel`. + +**Requirements**: +- Given remaining issues after triage, `/aidd-pr` should call `/aidd-parallel` to generate delegation prompts rather than generating them inline +- Given the inner `Constraints` block in `/aidd-pr` (codeblock format, branch targeting), should be removed from `/aidd-pr` — it belongs in `/aidd-parallel` +- Given `/aidd-pr delegate`, should call `/aidd-parallel delegate` and then resolve related PR conversation threads via the GitHub GraphQL API + +--- + +## Add aidd-parallel eval + +Add `ai-evals/aidd-parallel/` with a unit eval for prompt generation. + +**Requirements**: +- Given a list of tasks and a branch, the eval should assert one prompt is generated per task +- Given a generated prompt, should assert it starts with `/aidd-fix` +- Given a generated prompt, should assert it references the correct branch +- Given a generated prompt, should assert it is wrapped in a markdown codeblock + +--- + +## Update aidd-please discovery + +Add `/aidd-parallel` to the Commands block in `ai/skills/aidd-please/SKILL.md`. + +**Requirements**: +- Given the aidd-please Commands block, should list `/aidd-parallel` so agents can discover it diff --git a/tasks/aidd-riteway-ai-skill-epic.md b/tasks/aidd-riteway-ai-skill-epic.md new file mode 100644 index 00000000..2ef38a63 --- /dev/null +++ b/tasks/aidd-riteway-ai-skill-epic.md @@ -0,0 +1,46 @@ +# aidd-riteway-ai Skill Epic + +**Status**: 📋 PLANNED +**Goal**: Create an `/aidd-riteway-ai` skill that teaches agents how to write correct `riteway ai` prompt evals for multi-step flows that involve tool calls. + +## Overview + +Without guidance, agents default to writing Vitest structural tests instead of `.sudo` prompt evals, collapse multi-step flows into a single overloaded `userPrompt`, pre-supply tool return values instead of testing that the agent makes the right calls, and assert implementation details rather than functional requirements. This skill codifies the lessons learned and references `/aidd-tdd` and `/aidd-requirements` for assertion style and requirement format. + +--- + +## Create the aidd-riteway-ai skill + +Add `ai/skills/aidd-riteway-ai/SKILL.md` following the AgentSkills specification. + +**Requirements**: +- Given the agent needs to discover the skill, its name and description should be in the frontmatter +- Given the agent needs to discover what a skill does, the description should include a very brief description of functionality without delving into implementation details +- Given the agent needs to discover when to use a skill, the description should include a very brief "Use when..." clause +- Given the skill file, should include a role preamble and reference both `/aidd-tdd` and `/aidd-requirements` for assertion style and requirement format +- Given a multi-step flow under test, should instruct the agent to write one `.sudo` eval file per step rather than combining all steps into one `userPrompt` +- Given a unit eval for a step that involves tool calls (gh, GraphQL, API), should instruct the agent to inform the prompted agent that it is operating in a test environment and should use mock tools with stub return values instead of calling real APIs +- Given a unit eval for step 1 of a tool-calling flow, should instruct the agent to assert that the correct tool calls are made — not pre-supply the answers those calls would return +- Given a unit eval for step N > 1, should instruct the agent to supply the output of the previous step as context in the `userPrompt` +- Given an e2e eval, should instruct the agent to use real tools and follow the `-e2e.test.sudo` naming convention, mirroring the project's existing unit/e2e split +- Given fixture files needed by the eval, should be small files with one clear bug or condition per file +- Given assertions, should derive them strictly from the functional requirements of the skill under test using `/aidd-requirements` format, and include only assertions that test distinct observable behaviors + +--- + +## Add the aidd-riteway-ai command + +Add `ai/commands/aidd-riteway-ai.md` so the skill is invokable and discoverable. + +**Requirements**: +- Given the command file, should load and execute `ai/skills/aidd-riteway-ai/SKILL.md` +- Given the command file, should respect constraints from `/aidd-please` + +--- + +## Update aidd-please discovery + +Add `/aidd-riteway-ai` to the Commands block in `ai/skills/aidd-please/SKILL.md`. + +**Requirements**: +- Given the aidd-please Commands block, should list `/aidd-riteway-ai` so agents can discover it diff --git a/tasks/readme-skill-link-fix-epic.md b/tasks/readme-skill-link-fix-epic.md new file mode 100644 index 00000000..2aad7d8d --- /dev/null +++ b/tasks/readme-skill-link-fix-epic.md @@ -0,0 +1,11 @@ +# README Skill Link Fix Epic + +## Summary + +The README.md skills table contained a broken link after the `aidd-functional-requirements` +skill was renamed to `aidd-requirements`. The link pointed to a path that no longer exists. + +## Requirements + +- Given README.md skills table, should link `/aidd-requirements` to `ai/skills/aidd-requirements/README.md` +- Given README.md skills table, should not reference the removed `ai/skills/aidd-functional-requirements/README.md` path