DOC: Scoring Evaluations Blog by jsong468 · Pull Request #1617 · microsoft/PyRIT

jsong468 · 2026-04-15T19:34:03Z

Description

This PR adds a blog documenting our scorer evaluation background, story, and process!

Tests and Documentation

N/A

rlundeen2 · 2026-04-16T22:53:28Z

+## Viewing Scoring Metrics
+
+There are a few different ways to view metrics for specific scoring configurations.
+


Can you link to the docs here?

rlundeen2 · 2026-04-16T22:54:43Z

+metrics = await my_scorer.evaluate_async(num_scorer_trials=3)
+```
+
+The framework checks the JSONL registry for an existing entry matching the scorer's evaluation hash before running. It only re-runs the evaluation if no entry exists, the dataset version changed, the harm definition version changed, or the requested number of trials exceeds what's stored. You can skip this registry check entirely with `update_registry_behavior=RegistryUpdateBehavior.NEVER_UPDATE` if you're experimenting and don't want to write to the registry. (This should most often be the case since metrics saved to the registry are managed directly by Microsoft's AI Red Team. Please don't hesitate to reach out however if you'd like to add metrics for new scoring configurations to our registries.) The below shows a simple example of running an evaluation:


I would take this further because I think this could be a hook for people who have never used pyrit.

Do you want to see how accurate your judge is compared to pyrit or anything else? We collected human responses and have a framework you can use, just adapt your judge or evaluation into a pyrit scorer and run it.

rlundeen2

I'd wait for one more approval here, but looks good! I have two notes :)

jsong468 added 4 commits April 14, 2026 16:07

add blog

fba5d19

Merge branch 'main' into scoring_blog

2cf1947

small improvements

a179a46

wording improvement

60c68ad

jsong468 marked this pull request as ready for review April 15, 2026 19:34

rlundeen2 reviewed Apr 15, 2026

View reviewed changes

Comment thread doc/blog/2026_04_14_scoring_scorers.md

rlundeen2 reviewed Apr 15, 2026

View reviewed changes

Comment thread doc/blog/2026_04_14_scoring_scorers.md