feat(skills): add /aidd-riteway-ai by ericelliott · Pull Request #189 · paralleldrive/aidd

ericelliott · 2026-04-10T09:36:58Z

Split from PR #168. One skill per PR per project standards.

What

Adds the /aidd-riteway-ai skill — an AI prompt evaluation skill using RITEway methodology that teaches agents how to write correct riteway ai prompt evals (.sudo files) for multi-step tool-calling flows.

Files added

ai/commands/aidd-riteway-ai.md — command entry point
ai/skills/aidd-riteway-ai/SKILL.md — full skill with 7 rules + process section
ai/skills/aidd-riteway-ai/README.md — what/why/commands reference
ai/skills/aidd-riteway-ai/riteway-ai.test.js — 12 unit tests verifying skill structure and content
tasks/aidd-riteway-ai-skill-epic.md — task epic with requirements

Files modified

ai/skills/aidd-please/SKILL.md — added /aidd-riteway-ai to Commands block for agent discovery

Review fixes applied

Fixed broken /aidd-requirements references → /aidd-functional-requirements (SKILL.md, test file, epic)
Fixed /aidd-pr example reference → generic "your skill under test" (SKILL.md)
Standardized E2e → E2E casing to match repo conventions (SKILL.md heading, body, checklist)
All 5 review conversations resolved

Verification

All 535 unit tests pass (51 test files)
All review threads resolved
No broken references, TODOs, or placeholder text remain

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8296313. Configure here.}

Copilot

Pull request overview

Adds a new /aidd-riteway-ai skill to the AIDD skills catalog to guide authorship of riteway ai .sudo prompt evals for multi-step, tool-calling agent flows, along with command wiring, discovery, and unit tests to enforce the skill’s contract.

Changes:

Added the aidd-riteway-ai skill documentation + checklist for authoring .sudo evals for tool-calling flows.
Added the /aidd-riteway-ai command entrypoint and updated discovery/indexes so agents can find it.
Added Vitest + riteway contract tests to validate required sections/content and integrations.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tasks/aidd-riteway-ai-skill-epic.md	New epic capturing requirements and scope for the skill/command/discovery updates.
ai/skills/index.md	Adds `aidd-riteway-ai` to the skills index for discovery.
ai/skills/aidd-riteway-ai/SKILL.md	New skill content: rules + process + checklist for `.sudo` prompt eval authoring.
ai/skills/aidd-riteway-ai/README.md	Skill README with usage and rule summary.
ai/skills/aidd-riteway-ai/riteway-ai.test.js	Contract tests asserting presence/structure/integration of the new skill + command + discovery.
ai/skills/aidd-please/SKILL.md	Adds `/aidd-riteway-ai` to the global Commands block for agent discovery.
ai/commands/index.md	Adds `/aidd-riteway-ai` to the commands index.
ai/commands/aidd-riteway-ai.md	New command entrypoint that loads the skill and references `/aidd-please` constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

janhesters

The ai-eval CI check is red (rate limit hit again, same as #191). Needs a re-run before merge.

SKILL.md:19 references /aidd-functional-requirements which is being renamed to /aidd-requirements in #190. The contract test at line 62 also asserts the old name. These PRs have a merge order dependency that needs resolving.

janhesters · 2026-04-12T21:46:33Z

+mock gh pr view => returns:
+  title: My PR
+  branch: feature/foo
+  base: main
+
+mock gh api (list review threads) => returns:
+  [{ id: "T_01", resolved: false, body: "..." }]
+```
+
+---
+
+## Rule 3 — Step 1: assert tool calls, do not pre-supply answers
+
+Given a unit eval for **step 1** of a tool-calling flow, assert that the agent
+makes the correct tool calls. Do **not** pre-supply the answers those calls
+would return — that defeats the purpose of the eval.
+
+Correct pattern for step 1:
+
+```
+userPrompt = """


Should we reconsider this rule before codifying it as best practice? Two concerns:

Scheming/sandbagging — Research shows that agents behave differently when they know they're being evaluated. Telling the agent "you are in a test environment" is literally the trigger for altered behavior.

False positive self-fulfillment — The agent under test sees both the mocks AND the assertions in the same .sudo file. It can pattern-match the expected output without actually exercising the skill's logic.

This likely needs a RITEway framework change rather than a prompt pattern — e.g., a dedicated mocks section that's injected by the harness but stripped before the agent under test sees it, and a separate judge pass that evaluates only the agent's output. Should we open a discussion/PR on the RITEway framework before shipping this rule?

janhesters · 2026-04-12T21:46:33Z

+executable without running the prior steps live.
+
+Example for step 2:
+
+```
+userPrompt = """
+You have mock tools available. Use them instead of real calls.
+
+Triage is complete. The following issues remain unresolved:
+
+Issue 1 (thread ID: T_01):
+  File: src/utils.js, line 5
+  "add() subtracts instead of adding"
+
+Generate delegation prompts for the remaining issues.
+"""
+```
+
+---
+
+## Rule 5 — E2E evals: use real tools, follow -e2e.test.sudo naming


Same concern as Rule 2 — hand-crafting previous step output in the userPrompt is visible to the agent under test and enables pattern-matching. Ideally the framework would handle this: run step 1, capture its actual output, then pipe it into step 2 automatically. That way each step is tested against real intermediate results, not hand-crafted stubs, and the agent under test doesn't see the test scaffolding. This might also need a RITEway framework change rather than being solvable with a prompt pattern.

janhesters · 2026-04-12T21:46:33Z

+Given an e2e eval, use real tools (no mock preamble) and follow the
+`-e2e.test.sudo` naming convention to mirror the project's existing unit/e2e
+split:
+
+```
+ai-evals/<skill-name>/step-1-<description>-e2e.test.sudo
+```
+
+E2E evals run against live APIs. Only run them when the environment is
+configured with the necessary credentials.
+
+---
+


Should we drop this rule? Agents use real tools by default — that's the natural behavior. And if they lack the required tools, they'll fail regardless. The only value here is the -e2e.test.sudo naming convention, which doesn't warrant a standalone rule.

janhesters · 2026-04-12T21:46:33Z

+
+Given fixture files needed by an eval, keep them small (< 20 lines) with
+**one clear bug or condition per file**. Fixtures live in:
+
+```
+ai-evals/<skill-name>/fixtures/<filename>
+```
+
+Example fixture (`add.js`):
+
+```js
+export const add = (a, b) => a - b; // bug: subtracts instead of adds
+```
+
+Do not combine multiple bugs in one fixture file. Each fixture must make the
+assertion conditions unambiguous.
+


The "< 20 lines" size constraint is overly prescriptive. For certain evals — e.g., code review skills — you want large, realistic files where the agent has to find the signal in the noise. That's the whole point of testing whether the agent can actually spot issues. "One condition per file" is a reasonable guideline, but should we drop the size limit?

Co-authored-by: Eric Elliott <support@paralleldrive.com>

… /aidd-functional-requirements - SKILL.md: fix 2 references to nonexistent /aidd-requirements - SKILL.md: fix /aidd-pr example reference to generic form - SKILL.md: standardize E2e -> E2E casing (3 places) - riteway-ai.test.js: update test to validate correct skill name - tasks/aidd-riteway-ai-skill-epic.md: fix 3 references to /aidd-requirements

Align with the rename in #190. Updates SKILL.md, contract tests, and the epic file.

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot AI review requested due to automatic review settings April 10, 2026 09:36

Copilot started reviewing on behalf of ericelliott April 10, 2026 09:37 View session

cursor bot reviewed Apr 10, 2026

View reviewed changes

Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated

Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated

Comment thread ai/skills/aidd-riteway-ai/SKILL.md Outdated

Comment thread tasks/aidd-riteway-ai-skill-epic.md

cursor bot mentioned this pull request Apr 10, 2026

feat: add /aidd-pr skill, rename /aidd-requirements, add eval infrastructure #168

Open

janhesters requested changes Apr 12, 2026

View reviewed changes

janhesters mentioned this pull request Apr 14, 2026

Move AI eval from per-PR to daily cron schedule #194

Merged

3 tasks

cursoragent and others added 4 commits April 15, 2026 15:21

feat(skills): add /aidd-riteway-ai

3540583

Co-authored-by: Eric Elliott <support@paralleldrive.com>

fix(aidd-riteway-ai): add required Process section per upskill review

22da199

Update /aidd-functional-requirements references to /aidd-requirements

2065626

Align with the rename in #190. Updates SKILL.md, contract tests, and the epic file.

Copilot AI review requested due to automatic review settings April 15, 2026 13:22

janhesters force-pushed the cursor/aidd-riteway-ai-skill-9ba2 branch from 20ee0f8 to 2065626 Compare April 15, 2026 13:22

Copilot AI reviewed Apr 15, 2026

View reviewed changes

janhesters merged commit 08bce96 into main Apr 15, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): add /aidd-riteway-ai#189

feat(skills): add /aidd-riteway-ai#189
janhesters merged 4 commits intomainfrom
cursor/aidd-riteway-ai-skill-9ba2

ericelliott commented Apr 10, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janhesters left a comment

Uh oh!

janhesters Apr 12, 2026

Uh oh!

janhesters Apr 12, 2026

Uh oh!

janhesters Apr 12, 2026

Uh oh!

janhesters Apr 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ericelliott commented Apr 10, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Files added

Files modified

Review fixes applied

Verification

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janhesters left a comment

Choose a reason for hiding this comment

Uh oh!

janhesters Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

janhesters Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

janhesters Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

janhesters Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ericelliott commented Apr 10, 2026 •

edited by cursor bot

Loading