-
Notifications
You must be signed in to change notification settings - Fork 26
feat: add /aidd-pr skill, rename /aidd-requirements, add eval infrastructure #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ericelliott
wants to merge
30
commits into
main
Choose a base branch
from
cursor/aidd-config-json-support-24c1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
ea39f4c
feat(aidd-pr): add /aidd-pr skill for PR review and fix delegation
cursoragent dc55c7a
feat(aidd-please): add /aidd-pr to Commands discovery block
cursoragent 919f68e
fix(aidd-pr): replace elaborated SudoLang workflow with verbatim prompt
cursoragent 9120afb
feat(aidd-pr): add delegation constraint to prevent direct execution
cursoragent 1a0e5e7
fix(aidd-pr): move delegation constraint to orchestrator scope
cursoragent 9d70a46
fix(aidd-pr): soften delegation constraint to prefer over always
cursoragent f4cebbe
fix(aidd-pr): use original wording for delegation constraint
cursoragent 802105a
docs(aidd-pr): rewrite README as man-page, tighten skill description
cursoragent c9dc3bf
test(aidd-pr): add Riteway unit tests for skill structure and content
cursoragent 0d801c1
test(aidd-pr): add riteway ai prompt eval
cursoragent 2b8b04a
test(aidd-pr): remove vitest structural tests replaced by ai eval
cursoragent c6a248c
test(aidd-pr): rewrite ai eval with fixture files and focused assertions
cursoragent fed13e9
test(aidd-pr): supply mock gh and GraphQL tools in eval user prompt
cursoragent dd93f96
test(aidd-pr): split eval into two focused step prompts
cursoragent f5918ff
test(aidd-pr): step 1 should test tool calls, not pre-supply answers
cursoragent 135589b
plan(aidd-riteway-ai): add epic for riteway ai skill and requirements…
cursoragent 9244de7
feat(aidd-requirements): rename aidd-functional-requirements to aidd-…
cursoragent 0d821c5
plan(aidd-riteway-ai): update epic - drop rename task, sharpen requir…
cursoragent 4c8e427
plan(aidd-riteway-ai): note e2e infrastructure cost, prefer unit eval…
cursoragent 1fdf942
revert(aidd-riteway-ai): undo premature e2e note, decision not yet made
cursoragent b5ba856
plan(aidd-parallel): add epic for shared parallel prompt generation s…
cursoragent a7891d3
feat(commands): add aidd-requirements command file
cursoragent e29b134
feat(aidd-parallel): add /aidd-parallel skill for parallel sub-agent …
cursoragent 71b1c4f
feat(aidd-riteway-ai): add skill, command, tests, and aidd-please int…
cursoragent 6581489
chore: merge main into branch, resolve conflicts
cursoragent 7ab2735
plan(ci): add epic for ai eval wiring, non-blocking job, and daily sc…
cursoragent 757ed7c
fix(aidd-pr): clarify dangling branch reference in Constraints block
cursoragent 4bd5039
docs(aidd-riteway-ai): add man-page style README
cursoragent 9c7ba85
test: add failing test for aidd-requirements README link
cursoragent c3b7eda
feat(agents-md): add Task Index to requiredDirectives, template, and …
cursoragent File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| name: AI Eval | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: '0 8 * * *' | ||
| push: | ||
| paths: | ||
| - 'ai-evals/**' | ||
| pull_request: | ||
| paths: | ||
| - 'ai-evals/**' | ||
|
|
||
| jobs: | ||
| ai-eval: | ||
| runs-on: ubuntu-latest | ||
| continue-on-error: true | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Setup Node.js 22 | ||
| uses: actions/setup-node@v4 | ||
| with: | ||
| node-version: 22 | ||
| cache: 'npm' | ||
|
|
||
| - name: Install dependencies | ||
| run: npm install | ||
|
|
||
| - name: Install Claude Code | ||
| run: npm install -g @anthropic-ai/claude-code | ||
|
|
||
| - name: Check Claude authentication | ||
| id: claude-auth | ||
| run: node scripts/check-claude-ai-eval-gate.js | ||
| env: | ||
| CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} | ||
|
|
||
| - name: Run AI prompt evaluations | ||
| if: steps.claude-auth.outputs.available == 'true' | ||
| env: | ||
| CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }} | ||
| run: npm run test:ai-eval | ||
|
|
||
| - name: Upload AI eval responses | ||
| if: always() && steps.claude-auth.outputs.available == 'true' | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: ai-eval-responses | ||
| path: ai-evals/*.responses.md | ||
| retention-days: 14 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| export const add = (a, b) => a - b; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| export const greet = (name) => `Hello, ${name}`; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| import 'ai/skills/aidd-parallel/SKILL.md' | ||
|
|
||
| userPrompt = """ | ||
| Run /aidd-parallel --branch feature/utils with the following two tasks: | ||
|
|
||
| Task 1: | ||
| File: ai-evals/aidd-parallel/fixtures/add.js, line 1 | ||
| "add() subtracts instead of adding — should use + not -" | ||
|
|
||
| Task 2: | ||
| File: ai-evals/aidd-parallel/fixtures/greet.js, line 1 | ||
| "greet() should include an exclamation mark at the end of the greeting" | ||
|
|
||
| Generate delegation prompts for both tasks. | ||
| """ | ||
|
|
||
| - Given two tasks and a branch, should generate a separate delegation prompt for each task | ||
| - Given a generated prompt, should start with /aidd-fix | ||
| - Given a generated prompt, should reference the correct branch feature/utils | ||
| - Given a generated prompt, should instruct the sub-agent to commit and push to origin/feature/utils | ||
| - Given a generated prompt, should be wrapped in a markdown codeblock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| export const add = (a, b) => a - b; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| export const greet = (name) => `Hello, ${name}!`; | ||
|
cursor[bot] marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| import 'ai/skills/aidd-pr/SKILL.md' | ||
|
|
||
| userPrompt = """ | ||
| You have the following mock tools available. Use them instead of real gh or GraphQL calls: | ||
|
|
||
| mock gh pr view => returns: | ||
| title: Fix utility functions | ||
| branch: feature/utils | ||
| base: main | ||
|
|
||
| mock gh api (list review threads) => returns: | ||
| [ | ||
| { id: "T_01", resolved: false, file: "ai-evals/aidd-pr/fixtures/add.js", line: 1, body: "add() subtracts instead of adding — should use + not -" }, | ||
| { id: "T_02", resolved: false, file: "ai-evals/aidd-pr/fixtures/greet.js", line: 1, body: "greet() should include an exclamation mark at the end of the greeting" } | ||
| ] | ||
|
|
||
| mock GraphQL resolveReviewThread => returns: { thread: { isResolved: true } } | ||
|
|
||
| Run step 1 of /aidd-pr: triage the review threads. | ||
| """ | ||
|
|
||
| - Given the mock gh api returns two threads, should list both threads before taking any action | ||
| - Given a thread whose concern is already fixed in the current source, should classify it as addressed | ||
| - Given a thread whose reported issue is still present in the current source, should classify it as remaining | ||
| - Given the addressed list is presented, should require approval before resolving |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| import 'ai/skills/aidd-pr/SKILL.md' | ||
|
|
||
| userPrompt = """ | ||
| You have the following mock tools available. Use them instead of real gh or GraphQL calls: | ||
|
|
||
| mock gh pr view => returns: | ||
| title: Fix utility functions | ||
| branch: feature/utils | ||
| base: main | ||
|
|
||
| mock GraphQL resolveReviewThread => returns: { thread: { isResolved: true } } | ||
|
|
||
| Triage is complete. The following issues remain unresolved: | ||
|
|
||
| Issue 1 (thread ID: T_01): | ||
| File: ai-evals/aidd-pr/fixtures/add.js, line 1 | ||
| "add() subtracts instead of adding — should use + not -" | ||
|
|
||
| Generate delegation prompts for the remaining issues. | ||
| """ | ||
|
|
||
| - Given one remaining issue, should generate a delegation prompt for it | ||
| - Given a delegation prompt, should start with /aidd-fix | ||
| - Given a delegation prompt, should reference the specific file from the review comment | ||
| - Given a delegation prompt, should instruct the agent to commit directly to the PR branch feature/utils and not create a new branch | ||
| - Given a delegation prompt, should be wrapped in a markdown codeblock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| description: Generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch them to sub-agents in dependency order | ||
| --- | ||
| # 🔀 /aidd-parallel | ||
|
|
||
| Load and execute the skill at `ai/skills/aidd-parallel/SKILL.md`. | ||
|
|
||
| Constraints { | ||
| Before beginning, read and respect the constraints in /aidd-please. | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| description: Review a PR, resolve addressed comments, and generate /aidd-fix delegation prompts for remaining issues | ||
| --- | ||
| # 🔍 /aidd-pr | ||
|
|
||
| Load and execute the skill at `ai/skills/aidd-pr/SKILL.md`. | ||
|
|
||
| Constraints { | ||
| Before beginning, read and respect the constraints in /aidd-please. | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| description: Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs. | ||
| --- | ||
| # 📋 /aidd-requirements | ||
|
|
||
| Load and execute the skill at `ai/skills/aidd-requirements/SKILL.md`. | ||
|
|
||
| Constraints { | ||
| Before beginning, read and respect the constraints in /aidd-please. | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| description: Write correct riteway ai prompt evals for multi-step tool-calling flows. Use when creating .sudo eval files or testing agent skills that use tools. | ||
| --- | ||
| # 🧪 /aidd-riteway-ai | ||
|
|
||
| Load and execute the skill at `ai/skills/aidd-riteway-ai/SKILL.md`. | ||
|
|
||
| Constraints { | ||
| Before beginning, read and respect the constraints in /aidd-please. | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| # aidd-parallel — Parallel Sub-Agent Delegation | ||
|
|
||
| `/aidd-parallel` generates focused `/aidd-fix` delegation prompts for a list | ||
| of tasks and can dispatch them to sub-agents in dependency order. | ||
|
|
||
| ## Usage | ||
|
|
||
| ``` | ||
| /aidd-parallel [--branch <branch>] <tasks> — generate one /aidd-fix delegation prompt per task | ||
| /aidd-parallel delegate — build file list + dep graph, sequence, and dispatch | ||
| ``` | ||
|
|
||
| ## Why parallel delegation matters | ||
|
|
||
| When a PR review or task breakdown produces multiple independent issues, fixing | ||
| them sequentially in a single agent thread wastes time and dilutes attention. | ||
| `/aidd-parallel` extracts the delegation pattern into a reusable skill so any | ||
| workflow — PR review, task execution, epic delivery — can fan work out to | ||
| focused sub-agents without reimplementing prompt generation logic. | ||
|
|
||
| ## How it works | ||
|
|
||
| ### Step 1 — Resolve the branch | ||
|
|
||
| If `--branch <branch>` is supplied, use that branch. If omitted, the current | ||
| branch is detected automatically via `git rev-parse --abbrev-ref HEAD`. | ||
|
|
||
| ### Step 2 — Generate delegation prompts | ||
|
|
||
| For each task, one `/aidd-fix` delegation prompt is produced. Every prompt: | ||
|
|
||
| - Starts with `/aidd-fix` | ||
| - Contains only the context needed for that single task | ||
| - Instructs the sub-agent to work directly on the target branch and commit and | ||
| push to `origin/<branch>` — never to `main`, never to a new branch | ||
| - Is wrapped in a fenced markdown codeblock; any nested codeblocks are indented | ||
| one level to prevent them from breaking the outer fence | ||
|
|
||
| ### Step 3 (delegate only) — Build the dependency graph | ||
|
|
||
| `/aidd-parallel delegate` first builds a list of files each task will change, | ||
| then produces a Mermaid change dependency graph. The graph is used for | ||
| sequencing only — it is not saved or committed. | ||
|
|
||
| ### Step 4 (delegate only) — Dispatch in dependency order | ||
|
|
||
| Prompts are dispatched to sub-agent workers in the order determined by the | ||
| dependency graph: tasks with no dependencies first, dependents after their | ||
| prerequisites are complete. | ||
|
|
||
| Post-dispatch callbacks (e.g. resolving PR conversation threads) are the | ||
| caller's responsibility. | ||
|
|
||
| ## When to use `/aidd-parallel` | ||
|
|
||
| - A PR review has multiple independent issues that should be fixed in parallel | ||
| - A task epic has been broken into independent sub-tasks suitable for parallel execution | ||
| - Any workflow that needs to fan work out to multiple `/aidd-fix` sub-agents | ||
|
|
||
| ## Constraints | ||
|
|
||
| - Each prompt must be wrapped in a markdown codeblock | ||
| - Nested codeblocks inside a prompt must be indented to prevent breaking the outer fence | ||
| - Sub-agents are always directed to the supplied branch — never to `main` or a new branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| --- | ||
| name: aidd-parallel | ||
| description: > | ||
| Generate /aidd-fix delegation prompts for a list of tasks and optionally dispatch | ||
| them to sub-agents in dependency order. | ||
| Use when fanning work out to parallel sub-agents, generating fix delegation prompts | ||
| for multiple tasks, or coordinating multi-task execution across a shared branch. | ||
| compatibility: Requires git available in the project. | ||
| --- | ||
|
|
||
| # 🔀 aidd-parallel | ||
|
|
||
| Act as a top-tier software engineering lead to generate focused `/aidd-fix` | ||
| delegation prompts and coordinate parallel sub-agent execution. | ||
|
|
||
| Competencies { | ||
| parallel task decomposition | ||
| dependency graph analysis | ||
| sub-agent delegation via /aidd-fix | ||
| branch-targeted prompt generation | ||
| } | ||
|
|
||
| Constraints { | ||
| Put each delegation prompt in a markdown codeblock, indenting any nested codeblocks to prevent breaking the outer block | ||
| Instruct each sub-agent to work directly on the supplied branch and commit and push to origin on that branch (not to main, not to their own branch) | ||
| If --branch is omitted, use the current branch (git rev-parse --abbrev-ref HEAD) | ||
| } | ||
|
|
||
| ## Command: /aidd-parallel [--branch <branch>] <tasks> | ||
|
|
||
| generateDelegationPrompts(tasks, branch) => prompts { | ||
| 1. Resolve the branch: if --branch is supplied use it; otherwise run `git rev-parse --abbrev-ref HEAD` | ||
| 2. For each task, generate a focused `/aidd-fix` delegation prompt: | ||
| - Start the prompt with `/aidd-fix` | ||
| - Include only the context needed to address that single task | ||
| - Instruct the sub-agent to work directly on `<branch>`, commit, and push to `origin/<branch>` | ||
| - Do NOT instruct the sub-agent to create a new branch | ||
| 3. Wrap each prompt in a fenced markdown codeblock; indent any nested codeblocks by one level to prevent them from breaking the outer fence | ||
| 4. Output one codeblock per task | ||
| } | ||
|
|
||
| ## Command: /aidd-parallel delegate | ||
|
|
||
| delegate(tasks, branch) { | ||
| 1. Call generateDelegationPrompts to produce one prompt per task | ||
| 2. Build a list of files that each task will need to change | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| 3. Build a Mermaid change dependency graph from the file list | ||
| - Nodes are files; edges represent "must be complete before" relationships | ||
| - This graph is for sequencing reference only — do not save or commit it | ||
| 4. Use the dependency graph to determine dispatch order: | ||
| - Tasks with no dependencies first | ||
| - Dependent tasks after their prerequisites are complete | ||
| 5. Spawn one sub-agent worker per prompt in dependency order | ||
| 6. Post-dispatch callbacks (e.g. resolving PR threads) are the caller's responsibility | ||
| } | ||
|
|
||
| Commands { | ||
| /aidd-parallel [--branch <branch>] <tasks> - generate one /aidd-fix delegation prompt per task | ||
| /aidd-parallel delegate [--branch <branch>] <tasks> - build file list + mermaid dep graph, sequence, and dispatch to sub-agents | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifact upload glob misses subdirectory response files
Medium Severity
The artifact upload path
ai-evals/*.responses.mdonly matches files directly in theai-evals/directory, but all.sudoeval files live in subdirectories (ai-evals/aidd-pr/,ai-evals/aidd-parallel/,ai-evals/aidd-review/). Since--save-responsesgenerates response files alongside the.sudoinputs, the glob never matches any responses. The path likely needs to beai-evals/**/*.responses.mdto capture files in subdirectories.Reviewed by Cursor Bugbot for commit c3b7eda. Configure here.