Skip to content

test: failing repro for vocab quiz scoring bugs (#189, #191)#195

Closed
davidortinau wants to merge 2 commits intomainfrom
test/vocab-quiz-scoring-repro-189-191
Closed

test: failing repro for vocab quiz scoring bugs (#189, #191)#195
davidortinau wants to merge 2 commits intomainfrom
test/vocab-quiz-scoring-repro-189-191

Conversation

@davidortinau
Copy link
Copy Markdown
Owner

Stream B Step 1 of the Vocabulary Quiz bug cluster — failing-first regression tests for #189 and #191. No production code changes. Author: Jayne (Tester).

These are repro/regression tests that lock down the expected behavior so Wash (backend) and Kaylee (UI) have unambiguous targets. Wash's fix should turn the failing test green; Kaylee's UI fix is informed by the diagnostic that the service-side tests already pass.

What was added

tests/SentenceStudio.UnitTests/Integration/VocabQuizScoringRepro189And191Tests.cs — 4 tests using the existing PlanGenerationTestFixture (real EF Core + in-memory SQLite + DI), modeled on MasteryAlgorithmIntegrationTests.

Results on main (commit 2aab53d)

Total tests: 4
     Passed: 3
     Failed: 1
Test #189/#191 Result on main Meaning
Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState #189 ✅ PASS Service correctly records 1 attempt, 100% accuracy, no production-side bumps for a single correct MC turn
Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero #189 ✅ PASS Obsolete ProductionAttempts/ProductionCorrect stay zero for a recognition turn
Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn #191 FAIL Fresh word with 4 all-correct turns trips ReadyToRotateOut=True at turn 4
Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN #191 ✅ PASS (snapshot) Documents that the first rotation turn today is 4

#189 — disambiguated

Two competing hypotheses going in:

  • (a) VocabularyProgressService double-increments on a single attempt.
  • (b) The Learning Details panel reads legacy/obsolete fields that don't match the new streak-based truth, or there's a duplicate UI call path.

Both #189 service-side tests PASS on main. Service math is correct:

VocabularyProgress dump (after one correct MultipleChoice attempt):
  TotalAttempts=1, CorrectAttempts=1, Accuracy=1.000
  CurrentStreak=1.00, ProductionInStreak=0, MasteryScore=0.143
  RecognitionAttempts=1, RecognitionCorrect=1
  ProductionAttempts=0, ProductionCorrect=0

→ Hypothesis (b) stands. The "2 production attempts / 50% accuracy" panel readout has to come from the UI layer, not the service. The most likely culprits are:

  1. The Learning Details panel in VocabQuiz.razor reading obsolete legacy fields (e.g., ProductionAttempts directly) instead of the new streak-based fields, or
  2. A double-call path through RecordPendingAttemptAsync (called from NextItem, OverrideAsCorrect, and one other site — possible duplicate-fire vector).

Both are UI/quiz-page concerns belonging to Kaylee's Stream A. The two passing service tests stay as regression guards so the service contract can't silently regress while the UI fix is in flight.

#191 — confirmed

Captured trace from the failing test (one fresh word, all answers correct, mode chosen the same way VocabQuiz.razor chooses it — MC until CurrentStreak>=3 OR MasteryScore>=0.5, then Text):

turn=1 mode=MultipleChoice streak=1.00 prodInStreak=0 mastery=0.143 sessMC=1 sessText=0 ReadyToRotateOut=False
turn=2 mode=MultipleChoice streak=2.00 prodInStreak=0 mastery=0.286 sessMC=2 sessText=0 ReadyToRotateOut=False
turn=3 mode=MultipleChoice streak=3.00 prodInStreak=0 mastery=0.429 sessMC=3 sessText=0 ReadyToRotateOut=False
turn=4 mode=Text          streak=4.50 prodInStreak=1 mastery=0.714 sessMC=3 sessText=1 ReadyToRotateOut=True   ← rotation flips here
turn=5 mode=Text          streak=6.00 prodInStreak=2 mastery=1.000 sessMC=3 sessText=2 ReadyToRotateOut=True
turn=6 mode=Text          streak=7.50 prodInStreak=3 mastery=1.000 sessMC=3 sessText=3 ReadyToRotateOut=True

Failure message:

Expected firstRotateTurn to be greater than or equal to 5 because a brand-new word with 4 all-correct turns demonstrates too little mastery to rotate out — current Tier 2 logic flips this at turn 4 (3 MC + 1 Text), which is the rapid-empty behavior #191 describes, but found 4.

Root cause is in VocabularyQuizItem.ReadyToRotateOut Tier 2 (lines 33–55 of VocabularyQuizItem.cs): once MasteryScore >= 0.50 OR CurrentStreak >= 3, the only additional gates are SessionCorrectCount>=2 AND SessionTextCorrect>=1. With the quiz's mode auto-flip kicking Text in at turn 4, those gates are met immediately. That matches Captain's report of 26 fresh words mastered in 58 turns over 8 rounds (~2.2 turns/word).

How Wash should use this

  1. Read the failing trace above.
  2. Tighten Tier 2 in VocabularyQuizItem.ReadyToRotateOut (and/or the mode-flip threshold). Suggested rough targets: more required Text-correct turns, higher mastery floor, or per-word session minimums independent of the global session counters. Don't pick the curve unilaterally — discuss with Captain via decisions.md first.
  3. Run dotnet test --filter VocabQuizScoringRepro189And191Tests. Both Repro191_* tests will need updating after the fix:
    • Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn should pass.
    • Repro191_CharacterizeCurrentBehavior_* should be updated to reflect the new first-rotation turn (or removed once the curve is canonical).

How Kaylee should use this

  1. The two Repro189_* tests prove the service is fine. Don't touch VocabularyProgressService.RecordAttemptAsync for Accurate and total attempt don't make sense #189.
  2. Audit the Learning Details panel in VocabQuiz.razor (lines ~395–460) for any reads of legacy obsolete fields — replace with streak-based equivalents.
  3. Audit the call sites of RecordPendingAttemptAsync for duplicate-fire (NextItem ~1245, OverrideAsCorrect ~1394, plus the third site near 1490).

Out of scope for this PR

  • Production code (no edits to VocabularyProgressService, VocabularyQuizItem, VocabQuiz.razor, etc.).
  • The actual fix — that's Wash's Stream B Step 2 + Kaylee's Stream A.
  • Mode-selection changes — the test's ChooseQuizModeForTurn mirrors the current VocabQuiz.razor rule verbatim; if the rule moves, the helper moves with it.

Verification

$ dotnet build tests/SentenceStudio.UnitTests/SentenceStudio.UnitTests.csproj
Build succeeded. 0 Error(s)

$ dotnet test ... --filter "FullyQualifiedName~VocabQuizScoringRepro189And191Tests"
Total tests: 4 | Passed: 3 | Failed: 1

Branch: test/vocab-quiz-scoring-repro-189-191, off main (2aab53d). No conflicts with Kaylee's fix/vocab-quiz-ui-cluster-189-194.

Stream B Step 1 (Jayne). Adds 4 integration tests that pin down the
expected post-state of VocabularyProgress after well-defined quiz
interactions, run against a real EF Core + in-memory SQLite stack via
PlanGenerationTestFixture (same pattern as MasteryAlgorithmIntegrationTests).

#189 — Attempt counting / accuracy:
  Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState — PASS
  Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero  — PASS
  Both pass on main, which proves the ProgressService math is correct.
  Captain's '2 production attempts / 50% accuracy' panel reading
  therefore points at the UI panel reading legacy/wrong fields or a
  duplicate-call path — fix belongs in Stream A (Kaylee), not the
  service. Tests stay as regression guards for the service contract.

#191 — Latter rounds rapidly empty:
  Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn — FAIL on main
  Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN — PASS (snapshot)
  Captured failure: a brand-new word receiving 4 all-correct answers
  (3 MC followed by 1 Text — which is the mode the quiz auto-selects
  once CurrentStreak >= 3) flips ReadyToRotateOut=True at turn 4.
  VocabularyQuizItem Tier 2 (mastery>=0.50 OR streak>=3, plus only
  SessionCorrectCount>=2 and SessionTextCorrect>=1) is the trigger.
  This is the over-aggressive rotation #191 describes. Test will pass
  after Wash tightens the Tier 2 gates.

No production code changes.
davidortinau added a commit that referenced this pull request May 3, 2026
* test: failing repro for vocab quiz scoring bugs (#189, #191)

Stream B Step 1 (Jayne). Adds 4 integration tests that pin down the
expected post-state of VocabularyProgress after well-defined quiz
interactions, run against a real EF Core + in-memory SQLite stack via
PlanGenerationTestFixture (same pattern as MasteryAlgorithmIntegrationTests).

#189 — Attempt counting / accuracy:
  Repro189_SingleCorrectRecognitionAttempt_ProducesExpectedPanelState — PASS
  Repro189_SingleCorrectRecognition_LegacyProductionFieldsRemainZero  — PASS
  Both pass on main, which proves the ProgressService math is correct.
  Captain's '2 production attempts / 50% accuracy' panel reading
  therefore points at the UI panel reading legacy/wrong fields or a
  duplicate-call path — fix belongs in Stream A (Kaylee), not the
  service. Tests stay as regression guards for the service contract.

#191 — Latter rounds rapidly empty:
  Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn — FAIL on main
  Repro191_CharacterizeCurrentBehavior_FreshWordRotatesAtTurnN — PASS (snapshot)
  Captured failure: a brand-new word receiving 4 all-correct answers
  (3 MC followed by 1 Text — which is the mode the quiz auto-selects
  once CurrentStreak >= 3) flips ReadyToRotateOut=True at turn 4.
  VocabularyQuizItem Tier 2 (mastery>=0.50 OR streak>=3, plus only
  SessionCorrectCount>=2 and SessionTextCorrect>=1) is the trigger.
  This is the over-aggressive rotation #191 describes. Test will pass
  after Wash tightens the Tier 2 gates.

No production code changes.

* squad(jayne): log Stream B Step 1 outcome (vocab quiz repro #189 #191)

* fix(vocab-quiz): tighten rotation curve for fresh words (#191)

Closes #191.

Fresh words were rotating out of quiz rounds at turn 4 with all-correct
answers, yielding only ~3 effective practice repetitions before the word
disappeared. Two knobs are tuned to push the earliest legal rotation to
turn 5 without regressing already-known words.

Production changes (2 lines):

1. VocabularyProgressService.cs: EFFECTIVE_STREAK_DIVISOR 7.0f -> 12.0f
   Slows the mastery climb so MasteryScore reaches Tier 1 (>= 0.80) on
   turn 8+ rather than turn 6, and crosses the 0.50 promotion floor on
   turn 6 rather than turn 4.

2. VocabularyQuizItem.cs: Tier 2 trigger OR -> AND, floor (2,1) -> (4,2)
   - Trigger: mastery >= 0.50 && streak >= 3 (was OR). Closes a corner
     case where a single Text correct on a fresh word could drop the
     word into Tier 2 via streak alone.
   - Floor: SessionCorrectCount >= 4 && SessionTextCorrect >= 2 (was
     >= 2 && >= 1). Requires demonstrably more session evidence before
     a mid-mastery word is allowed to rotate out.

Simulator: tools/quiz-rotation-sim/sim.py reproduces production math
exactly. Headline (fresh, all-correct):

| Turn | Current (/7, OR/2,1) | Proposed (/12, AND/4,2) |
|------|---------------------|--------------------------|
|  4   | mastery 0.714 -> ROTATES (bug) | mastery 0.417, no  |
|  5   | mastery 1.000        | mastery 0.583 -> ROTATES |

Already-known words (mastery >= 0.80, streak >= 8) still rotate at the
first qualifying turn (Tier 1 unchanged). Existing user MasteryScore
data cannot regress: mastery is monotonic on correct
(`max(streakScore, mastery)` in RecordAttemptAsync line 154).

Tests:
- Jayne's Repro191_NewWord_AllCorrect_DoesNotRotateOutBeforeFifthTurn
  flips FAIL -> PASS (PR #195 verification harness).
- ~10 mastery-math fixtures bumped to track the new divisor (5 MC +
  2 Text -> 8 MC + 2 Text for IsKnown demonstrations; divisor literals
  /7.0f -> /12.0f).
- VocabQuizFilteringTests: Tier 2 floor test renamed and a new test
  Tier2_TriggerRequiresBothMasteryAndStreak added for the AND change.
- All 520 unit tests pass.

Language-tutor SLA review approved the turn-5 floor (vs turn-6) as the
right balance between learner spaced-repetition load and within-session
retention demonstration.

Follow-up (separate issue, not in this PR): decouple MasteryScore from
SessionRotationReady so session pacing and long-term mastery tracking
are independent levers.

Branched off PR #195 (Jayne's repro) so the fix lands together with its
verification harness.

* squad(wash): log Stream B Step 3 — #191 fix shipped via PR #198

* squad(wash): note PR #198 body cross-link to #197
@davidortinau
Copy link
Copy Markdown
Owner Author

Superseded by PR #198 (squash-merged to main). Jayne's repro tests landed verbatim as part of #198's atomic fix-plus-tests commit. Branch test/vocab-quiz-scoring-repro-189-191 no longer needed.

@davidortinau davidortinau deleted the test/vocab-quiz-scoring-repro-189-191 branch May 3, 2026 14:08
davidortinau added a commit that referenced this pull request May 3, 2026
- PR #196 (Stream A UI fixes): closes #189/#190/#192/#193/#194
- PR #198 (Stream B scoring fix): closes #191
- PR #195 (test-only draft): superseded, closed
- Follow-ups filed: #197 (decouple Mastery from SessionRotation),
  #199 (test helper DifficultyWeight bug)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant