Skip to content

Add reliability round-cost metrics#692

Merged
sungjunlee merged 1 commit into
mainfrom
issue-684-reliability-round-cost
Jun 8, 2026
Merged

Add reliability round-cost metrics#692
sungjunlee merged 1 commit into
mainfrom
issue-684-reliability-round-cost

Conversation

@sungjunlee

Copy link
Copy Markdown
Owner

Summary

Reissued #684 after the stacked PR base was merged and the original PR #690 became closed against the deleted base branch.

This branch is now rebased directly on main and contains the reliability round-cost metrics change only.

Changes

  • Add JSON round_cost summary to reliability-report.js.
  • Reuse existing review-lineage and event data for round-cost analysis.
  • Add human-readable round-cost output.
  • Extend reliability-report tests for legacy and linked round-cost scenarios.
  • Document the observation-only round-cost section in operator utilities.

Evidence

  • Rebased branch HEAD: 4f9ddc613b6e30642cd14fe7e117fec94ad38e2e
  • Targeted test after rebase: node --test tests/relay-dispatch/scripts/reliability-report.test.js
  • Result: 41 passed, 0 failed

Related

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@sungjunlee, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 1 minute and 42 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfbc8498-1eb9-4ceb-a3b9-eeaf3527623f

📥 Commits

Reviewing files that changed from the base of the PR and between 3eb3dd6 and 4f9ddc6.

📒 Files selected for processing (3)
  • skills/relay-dispatch/references/operator-utilities.md
  • skills/relay-dispatch/scripts/reliability-report.js
  • tests/relay-dispatch/scripts/reliability-report.test.js
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-684-reliability-round-cost

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sungjunlee

Copy link
Copy Markdown
Owner Author

Relay Review Round 2

Verdict: CHANGES_REQUESTED
Summary: review-runner fail-closed reviewer PASS because execution-evidence.json reported quality_execution_status=fail for the reviewed HEAD.
Contract: PASS
Quality Review: PASS
Quality Execution: FAIL
Issues:

  • execution-evidence.json:1 — Execution evidence failed validation for reviewed HEAD: stale artifact: recorded at d05bdee, reviewed at 4f9ddc6 Reviewer PASS cannot be applied without SHA-bound execution evidence for this commit.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f9ddc613b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


function classifyEvidenceFailureType({ explicitType, status, reason }) {
const normalizedType = normalizeFailureTypeText(explicitType);
if (normalizedType) return normalizedType;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve evidence failure detail before using generic types

When processing real review_preflight_failed events emitted by review-runner/preflight.js, stale, invalid, symlink, and strict failures all carry preflight_type=execution_evidence_fail and failure_class=fail; the distinguishing detail is only in reason. Returning the normalized explicit type here prevents the later reason checks from classifying stale/invalid failures, so evidence_preflight_failures.by_type and reviewer_rounds_avoided_by_preflight.by_type collapse those avoided rounds into generic execution_evidence_fail instead of the buckets operators are meant to compare before reviewer invocation.

Useful? React with 👍 / 👎.

@sungjunlee

Copy link
Copy Markdown
Owner Author

Relay Review

Verdict: LGTM
Summary: 검사상 Done Criteria 7개 모두 VERIFIED입니다. 제공 diff 기준으로 변경 범위는 reliability-report round_cost JSON/텍스트 출력, 해당 테스트, operator 문서에 한정됩니다. readReviewVerdictRecords 호출부와 relay-request, relay-events, dispatch/review-runner event producer 연동도 확인했으며 merge 전 차단할 검사상 이슈는 없습니다. 실행 증거는 리뷰 러너의 SHA-bound execution-evidence 검증 대상입니다.
Contract: PASS
Quality Review: PASS
Quality Execution: PASS
Rounds: 3

@sungjunlee sungjunlee merged commit 26e5511 into main Jun 8, 2026
2 checks passed
@sungjunlee sungjunlee deleted the issue-684-reliability-round-cost branch June 8, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reliability-report: measure round cost, decomposition signals, evidence preflight, and lineage

1 participant