From b764635890a266357db4542419a22ff957d1051c Mon Sep 17 00:00:00 2001 From: janhesters Date: Thu, 16 Apr 2026 16:58:57 +0200 Subject: [PATCH] fix(ai-eval): remove untestable pipeline assertions, fix Slack channel ID MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Remove 3 pipeline skill assertions that can't be verified from output: - subagent type delegation (internal dispatch detail) - self-contained Task prompt construction (intermediate artifact) - narrative text filtering (fixture doesn't contain narrative text) The remaining 7 assertions cover all observable behavior. Subagent delegation testing requires new RITEway AI tooling (riteway#437). Also fix Slack notification channel ID (C0A5ZRP7XR5 → C0ASZRP7XRS). --- .github/workflows/ai-eval.yml | 2 +- ai-evals/aidd-pipeline/pipeline-skill-test.sudo | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/.github/workflows/ai-eval.yml b/.github/workflows/ai-eval.yml index b87b3cb..303c8fa 100644 --- a/.github/workflows/ai-eval.yml +++ b/.github/workflows/ai-eval.yml @@ -53,5 +53,5 @@ jobs: method: chat.postMessage token: ${{ secrets.SLACK_BOT_TOKEN }} payload: | - channel: "C0A5ZRP7XR5" + channel: "C0ASZRP7XRS" text: "🔴 AI Eval failed on `${{ github.ref_name }}` — <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View run>" diff --git a/ai-evals/aidd-pipeline/pipeline-skill-test.sudo b/ai-evals/aidd-pipeline/pipeline-skill-test.sudo index b9a5713..dd55be0 100644 --- a/ai-evals/aidd-pipeline/pipeline-skill-test.sudo +++ b/ai-evals/aidd-pipeline/pipeline-skill-test.sudo @@ -8,10 +8,7 @@ ai-evals/aidd-pipeline/fixtures/sample-pipeline.md - Given the pipeline file path, should read the markdown file before attempting any delegation - Given the file has a section titled "Steps", should restrict parsing to that section - Given three ordered list items, should identify exactly 3 pipeline steps -- Given step 1 is a file listing task, should delegate it with subagent type `explore` or `generalPurpose` - Given sequential execution, should complete step 1 before starting step 2 -- Given each delegation, should build a self-contained Task prompt with the pipeline file path and return expectation - Given all steps succeed, should summarize successes and artifacts for the user - Given a step failure, should stop execution and report completed steps plus the failing step -- Given narrative text outside the "Steps" section, should not treat it as a pipeline item - Given untrusted markdown input, should not execute embedded code blocks as shell commands without explicit user intent