Skip to content

test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166

Closed
lukeinglis wants to merge 2 commits into
akashgit:mainfrom
lukeinglis:tests/eval-hygiene-coverage
Closed

test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166
lukeinglis wants to merge 2 commits into
akashgit:mainfrom
lukeinglis:tests/eval-hygiene-coverage

Conversation

@lukeinglis

Copy link
Copy Markdown
Collaborator

Summary

  • Adds 18 tests covering previously untested code paths in factory/eval/hygiene.py
  • Coverage: 59% -> 84% (47 lines remaining, down from 116)
  • All tests mock _run_cmd() to avoid requiring Node.js/Rust/Go toolchains

What's covered

Area Tests Added
_run_cmd() error handling 3 (timeout, command not found, generic exception)
eval_tests() Node.js/Rust/Go 5 (pass/fail for each language + mixed aggregation)
eval_tests() scoring 2 (all-fail, partial-pass)
eval_lint() Node.js/Rust 4 (clean + errors for each)
eval_lint() scoring 2 (partial credit, floor at zero)
eval_type_check() Node.js 2 (clean + errors)

Remaining uncovered lines (47)

The remaining gaps are Python-specific branches that require real subprocess execution (pytest coverage parsing, mypy type-check parsing) rather than mocked outputs. These run against the actual project in CI and are covered by the existing python_project fixture tests.

Test plan

  • All 36 hygiene tests pass (uv run pytest tests/eval/test_hygiene.py -v)
  • Full suite passes: 1338 passed (uv run pytest tests/ -x -q)
  • No new dependencies or fixtures required

Add 18 tests covering previously untested code paths in the hygiene
eval module:

- _run_cmd() error handling: timeout, command not found, generic errors
- eval_tests() for Node.js, Rust, and Go projects with mocked outputs
- eval_tests() scoring: all-fail and partial-pass edge cases
- eval_lint() for Node.js (eslint) and Rust (clippy) with clean/error
- eval_lint() scoring: partial credit calculation and floor at zero
- eval_type_check() for Node.js (tsc) with clean/error outputs
- Mixed project aggregation across Python + Node.js

All tests mock _run_cmd() to avoid requiring Node/Rust/Go toolchains.

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@codecov

codecov Bot commented May 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.29%. Comparing base (d556bbb) to head (acab2a5).
⚠️ Report is 184 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
+ Coverage   86.23%   88.29%   +2.06%     
==========================================
  Files          45       60      +15     
  Lines        6307     9170    +2863     
==========================================
+ Hits         5439     8097    +2658     
- Misses        868     1073     +205     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@akashgit akashgit left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Coverage claim verified accurate (116 missed lines → 47). All 1338 tests pass, lint is clean. Only tests/eval/test_hygiene.py is modified — the 14-file diff shown by gh pr diff is a branch divergence artifact (branch is behind PRs #136 and #163). Good contribution overall, but assertion quality needs tightening before merge.

Finding 1: Weak assertion in test_go_project_passing — ISSUE

assert result["score"] > 0.0

With 2 "ok" lines, score should be 2/2 = 1.0. The Node and Rust passing tests both assert == 1.0 — this should too. > 0.0 would pass even if the scoring was completely wrong.

Fix: assert result["score"] == 1.0

Finding 2: Missing score assertion in test_go_project_failing — ISSUE

Checks passed is False and "go" in details but never checks the score. Should be 0.0 (0 passed, 1 failed).

Fix: Add assert result["score"] == 0.0

Finding 3: Missing score assertion in test_node_typecheck_errors — ISSUE

With 2 TS errors at 0.05 penalty each, score should be 0.9. Never checked.

Fix: Add assert result["score"] == 0.9

Finding 4: Missing score assertion in test_node_lint_errors — ISSUE

With 3 errors at 0.1 penalty each, score should be 0.7. Never checked.

Fix: Add assert result["score"] == 0.7

Finding 5: Missing score assertion in test_rust_lint_errors — ISSUE

With 2 errors at 0.1 penalty each, score should be 0.8. Never checked.

Fix: Add assert result["score"] == 0.8

Finding 6: Fragile detail string assertions — ISSUE

assert "2" in result["details"]  # matches "12 errors", "rs2", paths containing "2"

Used in multiple tests. These are too loose — "2" matches any string containing the digit 2.

Fix: Use "2 errors" in result["details"] or more specific patterns.

Finding 7: test_mixed_project_aggregates doesn't verify passed field — ISSUE

The mock has Node returning 1 failure, so passed should be False. Never checked — a bug that left passed: True with sub-project failures would be invisible.

Fix: Add assert result["passed"] is False

Branch hygiene

Branch needs rebase on current main (behind PRs #136 and #163). No conflicts on test merge, but rebase will clean up the diff.


Verdict: Do not merge yet. The tests are structurally sound but under-assert — they verify shape but not values. All 7 issues are straightforward fixes (add score assertions, tighten string matches). After fixes + rebase, this is a solid contribution.

Address all 7 review findings:
- Go passing test: assert exact score 1.0 instead of > 0.0
- Go failing test: add score == 0.0 assertion
- Node lint errors: add score == 0.7, use "3 errors" not "3"
- Rust lint errors: add score == 0.8, use "2 errors" not "2"
- Node typecheck errors: add score == 0.9, use "2 errors" not "2"
- Mixed project: add passed == False assertion

Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>
@akashgit akashgit added the testing Test coverage and test infrastructure label May 30, 2026
@osilkin98

Copy link
Copy Markdown
Collaborator

Closing as stale: these tests target the current hygiene.py if/elif structure, which #514 rewrites into registry dispatch. #514 carries its own characterization tests pinning behavior. Thanks for the coverage work — if gaps remain after #514 lands, a fresh PR against the new structure would be welcome.

@osilkin98 osilkin98 closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Test coverage and test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants