test: failing repro for vocab quiz scoring bugs (#189, #191) by davidortinau · Pull Request #195 · davidortinau/SentenceStudio

davidortinau · 2026-05-03T01:31:26Z

Stream B Step 1 of the Vocabulary Quiz bug cluster — failing-first regression tests for #189 and #191. No production code changes. Author: Jayne (Tester).

These are repro/regression tests that lock down the expected behavior so Wash (backend) and Kaylee (UI) have unambiguous targets. Wash's fix should turn the failing test green; Kaylee's UI fix is informed by the diagnostic that the service-side tests already pass.

What was added

tests/SentenceStudio.UnitTests/Integration/VocabQuizScoringRepro189And191Tests.cs — 4 tests using the existing PlanGenerationTestFixture (real EF Core + in-memory SQLite + DI), modeled on MasteryAlgorithmIntegrationTests.

Results on `main` (commit `2aab53d`)

Total tests: 4
     Passed: 3
     Failed: 1

Test	#189/#191	Result on main	Meaning
`Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState`	#189	✅ PASS	Service correctly records 1 attempt, 100% accuracy, no production-side bumps for a single correct MC turn
`Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero`	#189	✅ PASS	Obsolete `ProductionAttempts`/`ProductionCorrect` stay zero for a recognition turn
`Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn`	#191	❌ FAIL	Fresh word with 4 all-correct turns trips `ReadyToRotateOut=True` at turn 4
`Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN`	#191	✅ PASS (snapshot)	Documents that the first rotation turn today is 4

#189 — disambiguated

Two competing hypotheses going in:

(a) VocabularyProgressService double-increments on a single attempt.
(b) The Learning Details panel reads legacy/obsolete fields that don't match the new streak-based truth, or there's a duplicate UI call path.

Both #189 service-side tests PASS on main. Service math is correct:

VocabularyProgress dump (after one correct MultipleChoice attempt):
  TotalAttempts=1, CorrectAttempts=1, Accuracy=1.000
  CurrentStreak=1.00, ProductionInStreak=0, MasteryScore=0.143
  RecognitionAttempts=1, RecognitionCorrect=1
  ProductionAttempts=0, ProductionCorrect=0

→ Hypothesis (b) stands. The "2 production attempts / 50% accuracy" panel readout has to come from the UI layer, not the service. The most likely culprits are:

The Learning Details panel in VocabQuiz.razor reading obsolete legacy fields (e.g., ProductionAttempts directly) instead of the new streak-based fields, or
A double-call path through RecordPendingAttemptAsync (called from NextItem, OverrideAsCorrect, and one other site — possible duplicate-fire vector).

Both are UI/quiz-page concerns belonging to Kaylee's Stream A. The two passing service tests stay as regression guards so the service contract can't silently regress while the UI fix is in flight.

#191 — confirmed

Captured trace from the failing test (one fresh word, all answers correct, mode chosen the same way VocabQuiz.razor chooses it — MC until CurrentStreak>=3 OR MasteryScore>=0.5, then Text):

turn=1 mode=MultipleChoice streak=1.00 prodInStreak=0 mastery=0.143 sessMC=1 sessText=0 ReadyToRotateOut=False
turn=2 mode=MultipleChoice streak=2.00 prodInStreak=0 mastery=0.286 sessMC=2 sessText=0 ReadyToRotateOut=False
turn=3 mode=MultipleChoice streak=3.00 prodInStreak=0 mastery=0.429 sessMC=3 sessText=0 ReadyToRotateOut=False
turn=4 mode=Text          streak=4.50 prodInStreak=1 mastery=0.714 sessMC=3 sessText=1 ReadyToRotateOut=True   ← rotation flips here
turn=5 mode=Text          streak=6.00 prodInStreak=2 mastery=1.000 sessMC=3 sessText=2 ReadyToRotateOut=True
turn=6 mode=Text          streak=7.50 prodInStreak=3 mastery=1.000 sessMC=3 sessText=3 ReadyToRotateOut=True

Failure message:

Expected firstRotateTurn to be greater than or equal to 5 because a brand-new word with 4 all-correct turns demonstrates too little mastery to rotate out — current Tier 2 logic flips this at turn 4 (3 MC + 1 Text), which is the rapid-empty behavior #191 describes, but found 4.

Root cause is in VocabularyQuizItem.ReadyToRotateOut Tier 2 (lines 33–55 of VocabularyQuizItem.cs): once MasteryScore >= 0.50 OR CurrentStreak >= 3, the only additional gates are SessionCorrectCount>=2 AND SessionTextCorrect>=1. With the quiz's mode auto-flip kicking Text in at turn 4, those gates are met immediately. That matches Captain's report of 26 fresh words mastered in 58 turns over 8 rounds (~2.2 turns/word).

How Wash should use this

Read the failing trace above.
Tighten Tier 2 in VocabularyQuizItem.ReadyToRotateOut (and/or the mode-flip threshold). Suggested rough targets: more required Text-correct turns, higher mastery floor, or per-word session minimums independent of the global session counters. Don't pick the curve unilaterally — discuss with Captain via decisions.md first.
Run dotnet test --filter VocabQuizScoringRepro189And191Tests. Both Repro191_* tests will need updating after the fix:
- Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn should pass.
- Repro191_CharacterizeCurrentBehavior_* should be updated to reflect the new first-rotation turn (or removed once the curve is canonical).

How Kaylee should use this

The two Repro189_* tests prove the service is fine. Don't touch VocabularyProgressService.RecordAttemptAsync for Accurate and total attempt don't make sense #189.
Audit the Learning Details panel in VocabQuiz.razor (lines ~395–460) for any reads of legacy obsolete fields — replace with streak-based equivalents.
Audit the call sites of RecordPendingAttemptAsync for duplicate-fire (NextItem ~1245, OverrideAsCorrect ~1394, plus the third site near 1490).

Out of scope for this PR

Production code (no edits to VocabularyProgressService, VocabularyQuizItem, VocabQuiz.razor, etc.).
The actual fix — that's Wash's Stream B Step 2 + Kaylee's Stream A.
Mode-selection changes — the test's ChooseQuizModeForTurn mirrors the current VocabQuiz.razor rule verbatim; if the rule moves, the helper moves with it.

Verification

$ dotnet build tests/SentenceStudio.UnitTests/SentenceStudio.UnitTests.csproj
Build succeeded. 0 Error(s)

$ dotnet test ... --filter "FullyQualifiedName~VocabQuizScoringRepro189And191Tests"
Total tests: 4 | Passed: 3 | Failed: 1

Branch: test/vocab-quiz-scoring-repro-189-191, off main (2aab53d). No conflicts with Kaylee's fix/vocab-quiz-ui-cluster-189-194.

Stream B Step 1 (Jayne). Adds 4 integration tests that pin down the expected post-state of VocabularyProgress after well-defined quiz interactions, run against a real EF Core + in-memory SQLite stack via PlanGenerationTestFixture (same pattern as MasteryAlgorithmIntegrationTests). #189 — Attempt counting / accuracy: Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState — PASS Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero — PASS Both pass on main, which proves the ProgressService math is correct. Captain's '2 production attempts / 50% accuracy' panel reading therefore points at the UI panel reading legacy/wrong fields or a duplicate-call path — fix belongs in Stream A (Kaylee), not the service. Tests stay as regression guards for the service contract. #191 — Latter rounds rapidly empty: Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn — FAIL on main Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN — PASS (snapshot) Captured failure: a brand-new word receiving 4 all-correct answers (3 MC followed by 1 Text — which is the mode the quiz auto-selects once CurrentStreak >= 3) flips ReadyToRotateOut=True at turn 4. VocabularyQuizItem Tier 2 (mastery>=0.50 OR streak>=3, plus only SessionCorrectCount>=2 and SessionTextCorrect>=1) is the trigger. This is the over-aggressive rotation #191 describes. Test will pass after Wash tightens the Tier 2 gates. No production code changes.

* test: failing repro for vocab quiz scoring bugs (#189, #191) Stream B Step 1 (Jayne). Adds 4 integration tests that pin down the expected post-state of VocabularyProgress after well-defined quiz interactions, run against a real EF Core + in-memory SQLite stack via PlanGenerationTestFixture (same pattern as MasteryAlgorithmIntegrationTests). #189 — Attempt counting / accuracy: Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState — PASS Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero — PASS Both pass on main, which proves the ProgressService math is correct. Captain's '2 production attempts / 50% accuracy' panel reading therefore points at the UI panel reading legacy/wrong fields or a duplicate-call path — fix belongs in Stream A (Kaylee), not the service. Tests stay as regression guards for the service contract. #191 — Latter rounds rapidly empty: Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn — FAIL on main Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN — PASS (snapshot) Captured failure: a brand-new word receiving 4 all-correct answers (3 MC followed by 1 Text — which is the mode the quiz auto-selects once CurrentStreak >= 3) flips ReadyToRotateOut=True at turn 4. VocabularyQuizItem Tier 2 (mastery>=0.50 OR streak>=3, plus only SessionCorrectCount>=2 and SessionTextCorrect>=1) is the trigger. This is the over-aggressive rotation #191 describes. Test will pass after Wash tightens the Tier 2 gates. No production code changes. * squad(jayne): log Stream B Step 1 outcome (vocab quiz repro #189 #191) * fix(vocab-quiz): tighten rotation curve for fresh words (#191) Closes #191. Fresh words were rotating out of quiz rounds at turn 4 with all-correct answers, yielding only ~3 effective practice repetitions before the word disappeared. Two knobs are tuned to push the earliest legal rotation to turn 5 without regressing already-known words. Production changes (2 lines): 1. VocabularyProgressService.cs: EFFECTIVE_STREAK_DIVISOR 7.0f -> 12.0f Slows the mastery climb so MasteryScore reaches Tier 1 (>= 0.80) on turn 8+ rather than turn 6, and crosses the 0.50 promotion floor on turn 6 rather than turn 4. 2. VocabularyQuizItem.cs: Tier 2 trigger OR -> AND, floor (2,1) -> (4,2) - Trigger: mastery >= 0.50 && streak >= 3 (was OR). Closes a corner case where a single Text correct on a fresh word could drop the word into Tier 2 via streak alone. - Floor: SessionCorrectCount >= 4 && SessionTextCorrect >= 2 (was >= 2 && >= 1). Requires demonstrably more session evidence before a mid-mastery word is allowed to rotate out. Simulator: tools/quiz-rotation-sim/sim.py reproduces production math exactly. Headline (fresh, all-correct): | Turn | Current (/7, OR/2,1) | Proposed (/12, AND/4,2) | |------|---------------------|--------------------------| | 4 | mastery 0.714 -> ROTATES (bug) | mastery 0.417, no | | 5 | mastery 1.000 | mastery 0.583 -> ROTATES | Already-known words (mastery >= 0.80, streak >= 8) still rotate at the first qualifying turn (Tier 1 unchanged). Existing user MasteryScore data cannot regress: mastery is monotonic on correct (`max(streakScore, mastery)` in RecordAttemptAsync line 154). Tests: - Jayne's Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn flips FAIL -> PASS (PR #195 verification harness). - ~10 mastery-math fixtures bumped to track the new divisor (5 MC + 2 Text -> 8 MC + 2 Text for IsKnown demonstrations; divisor literals /7.0f -> /12.0f). - VocabQuizFilteringTests: Tier 2 floor test renamed and a new test Tier2_TriggerRequiresBothMasteryAndStreak added for the AND change. - All 520 unit tests pass. Language-tutor SLA review approved the turn-5 floor (vs turn-6) as the right balance between learner spaced-repetition load and within-session retention demonstration. Follow-up (separate issue, not in this PR): decouple MasteryScore from SessionRotationReady so session pacing and long-term mastery tracking are independent levers. Branched off PR #195 (Jayne's repro) so the fix lands together with its verification harness. * squad(wash): log Stream B Step 3 — #191 fix shipped via PR #198 * squad(wash): note PR #198 body cross-link to #197

davidortinau · 2026-05-03T14:08:56Z

Superseded by PR #198 (squash-merged to main). Jayne's repro tests landed verbatim as part of #198's atomic fix-plus-tests commit. Branch test/vocab-quiz-scoring-repro-189-191 no longer needed.

- PR #196 (Stream A UI fixes): closes #189/#190/#192/#193/#194 - PR #198 (Stream B scoring fix): closes #191 - PR #195 (test-only draft): superseded, closed - Follow-ups filed: #197 (decouple Mastery from SessionRotation), #199 (test helper DifficultyWeight bug) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

davidortinau added 2 commits May 2, 2026 20:30

squad(jayne): log Stream B Step 1 outcome (vocab quiz repro #189 #191)

277e10e

This was referenced May 3, 2026

fix(vocab-quiz): UI cluster — anti-cheat + UX (#190 #192 #193 #194) #196

Merged

fix(vocab-quiz): tighten rotation curve for fresh words (#191) #198

Merged

davidortinau closed this May 3, 2026

davidortinau deleted the test/vocab-quiz-scoring-repro-189-191 branch May 3, 2026 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: failing repro for vocab quiz scoring bugs (#189, #191)#195

test: failing repro for vocab quiz scoring bugs (#189, #191)#195
davidortinau wants to merge 2 commits intomainfrom
test/vocab-quiz-scoring-repro-189-191

davidortinau commented May 3, 2026

Uh oh!

davidortinau commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidortinau commented May 3, 2026

What was added

Results on main (commit 2aab53d)

#189 — disambiguated

#191 — confirmed

How Wash should use this

How Kaylee should use this

Out of scope for this PR

Verification

Uh oh!

davidortinau commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Results on `main` (commit `2aab53d`)