Skip to content

Refusal analytics + benchmark refusal-rate gate in CI #4

@ronishgeorge

Description

@ronishgeorge

We refuse on (a) low retrieval scores and (b) low verifier support. Need observability + a CI gate so a regression that drives refusal rate above acceptable bounds blocks releases.

Plan:

  • Datadog counter for refusal reasons (no_supporting_source, low_confidence_retrieval, insufficient_grounding)
  • Add refused_rate to eval.summarize (already done)
  • Set CI threshold: composite_mean must hold, refused_rate must stay <15%

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions