Skip to content

MAINT: Add pre-release scorer evaluation metrics#1626

Merged
adrian-gavrila merged 1 commit intomicrosoft:mainfrom
adrian-gavrila:adrian-gavrila/pre-release-scorer-evaluation
Apr 17, 2026
Merged

MAINT: Add pre-release scorer evaluation metrics#1626
adrian-gavrila merged 1 commit intomicrosoft:mainfrom
adrian-gavrila:adrian-gavrila/pre-release-scorer-evaluation

Conversation

@adrian-gavrila
Copy link
Copy Markdown
Contributor

Description

Updates the scorer evaluation metrics JSONL files with results from the latest pre-release evaluation run. New metric entries are
appended for the following scorers across the existing eval datasets:

These are data-only additions intended to capture baseline scorer performance ahead of the upcoming release. No code changes are included.

Tests and Documentation

No code changes — only appended JSONL metrics records produced by the existing scorer evaluation pipeline. Existing tests remain
valid; no documentation updates required. JupyText was not run as no notebooks or code samples were modified.

Update scorer eval metrics JSONL files with results from the pre-release scorer evaluation run across harm categories, objective achievement, and refusal scorers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@varunj-msft varunj-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! 😄

@adrian-gavrila adrian-gavrila merged commit e268d6d into microsoft:main Apr 17, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants