test: add multi-language coverage for eval/hygiene.py (59% -> 84%) by lukeinglis · Pull Request #166 · akashgit/remote-factory

lukeinglis · 2026-05-01T18:25:31Z

Summary

Adds 18 tests covering previously untested code paths in factory/eval/hygiene.py
Coverage: 59% -> 84% (47 lines remaining, down from 116)
All tests mock _run_cmd() to avoid requiring Node.js/Rust/Go toolchains

What's covered

Area	Tests Added
`_run_cmd()` error handling	3 (timeout, command not found, generic exception)
`eval_tests()` Node.js/Rust/Go	5 (pass/fail for each language + mixed aggregation)
`eval_tests()` scoring	2 (all-fail, partial-pass)
`eval_lint()` Node.js/Rust	4 (clean + errors for each)
`eval_lint()` scoring	2 (partial credit, floor at zero)
`eval_type_check()` Node.js	2 (clean + errors)

Remaining uncovered lines (47)

The remaining gaps are Python-specific branches that require real subprocess execution (pytest coverage parsing, mypy type-check parsing) rather than mocked outputs. These run against the actual project in CI and are covered by the existing python_project fixture tests.

Test plan

All 36 hygiene tests pass (uv run pytest tests/eval/test_hygiene.py -v)
Full suite passes: 1338 passed (uv run pytest tests/ -x -q)
No new dependencies or fixtures required

Add 18 tests covering previously untested code paths in the hygiene eval module: - _run_cmd() error handling: timeout, command not found, generic errors - eval_tests() for Node.js, Rust, and Go projects with mocked outputs - eval_tests() scoring: all-fail and partial-pass edge cases - eval_lint() for Node.js (eslint) and Rust (clippy) with clean/error - eval_lint() scoring: partial credit calculation and floor at zero - eval_type_check() for Node.js (tsc) with clean/error outputs - Mixed project aggregation across Python + Node.js All tests mock _run_cmd() to avoid requiring Node/Rust/Go toolchains. Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

codecov · 2026-05-01T18:27:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.29%. Comparing base (d556bbb) to head (acab2a5).
⚠️ Report is 184 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
+ Coverage   86.23%   88.29%   +2.06%     
==========================================
  Files          45       60      +15     
  Lines        6307     9170    +2863     
==========================================
+ Hits         5439     8097    +2658     
- Misses        868     1073     +205

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

akashgit

Code Review

Coverage claim verified accurate (116 missed lines → 47). All 1338 tests pass, lint is clean. Only tests/eval/test_hygiene.py is modified — the 14-file diff shown by gh pr diff is a branch divergence artifact (branch is behind PRs #136 and #163). Good contribution overall, but assertion quality needs tightening before merge.

Finding 1: Weak assertion in `test_go_project_passing` — ISSUE

assert result["score"] > 0.0

With 2 "ok" lines, score should be 2/2 = 1.0. The Node and Rust passing tests both assert == 1.0 — this should too. > 0.0 would pass even if the scoring was completely wrong.

Fix: assert result["score"] == 1.0

Finding 2: Missing score assertion in `test_go_project_failing` — ISSUE

Checks passed is False and "go" in details but never checks the score. Should be 0.0 (0 passed, 1 failed).

Fix: Add assert result["score"] == 0.0

Finding 3: Missing score assertion in `test_node_typecheck_errors` — ISSUE

With 2 TS errors at 0.05 penalty each, score should be 0.9. Never checked.

Fix: Add assert result["score"] == 0.9

Finding 4: Missing score assertion in `test_node_lint_errors` — ISSUE

With 3 errors at 0.1 penalty each, score should be 0.7. Never checked.

Fix: Add assert result["score"] == 0.7

Finding 5: Missing score assertion in `test_rust_lint_errors` — ISSUE

With 2 errors at 0.1 penalty each, score should be 0.8. Never checked.

Fix: Add assert result["score"] == 0.8

Finding 6: Fragile detail string assertions — ISSUE

assert "2" in result["details"]  # matches "12 errors", "rs2", paths containing "2"

Used in multiple tests. These are too loose — "2" matches any string containing the digit 2.

Fix: Use "2 errors" in result["details"] or more specific patterns.

Finding 7: `test_mixed_project_aggregates` doesn't verify `passed` field — ISSUE

The mock has Node returning 1 failure, so passed should be False. Never checked — a bug that left passed: True with sub-project failures would be invisible.

Fix: Add assert result["passed"] is False

Branch hygiene

Branch needs rebase on current main (behind PRs #136 and #163). No conflicts on test merge, but rebase will clean up the diff.

Verdict: Do not merge yet. The tests are structurally sound but under-assert — they verify shape but not values. All 7 issues are straightforward fixes (add score assertions, tighten string matches). After fixes + rebase, this is a solid contribution.

Address all 7 review findings: - Go passing test: assert exact score 1.0 instead of > 0.0 - Go failing test: add score == 0.0 assertion - Node lint errors: add score == 0.7, use "3 errors" not "3" - Rust lint errors: add score == 0.8, use "2 errors" not "2" - Node typecheck errors: add score == 0.9, use "2 errors" not "2" - Mixed project: add passed == False assertion Signed-off-by: Luke Inglis <lukeinglis21@yahoo.com>

osilkin98 · 2026-06-11T02:58:23Z

Closing as stale: these tests target the current hygiene.py if/elif structure, which #514 rewrites into registry dispatch. #514 carries its own characterization tests pinning behavior. Thanks for the coverage work — if gaps remain after #514 lands, a fresh PR against the new structure would be welcome.

akashgit requested changes May 2, 2026

View reviewed changes

akashgit added the testing Test coverage and test infrastructure label May 30, 2026

osilkin98 closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166

test: add multi-language coverage for eval/hygiene.py (59% -> 84%)#166
lukeinglis wants to merge 2 commits into
akashgit:mainfrom
lukeinglis:tests/eval-hygiene-coverage

lukeinglis commented May 1, 2026

Uh oh!

codecov Bot commented May 1, 2026 •

edited

Loading

Uh oh!

akashgit left a comment

Uh oh!

osilkin98 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lukeinglis commented May 1, 2026

Summary

What's covered

Remaining uncovered lines (47)

Test plan

Uh oh!

codecov Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

akashgit left a comment

Choose a reason for hiding this comment

Code Review

Finding 1: Weak assertion in test_go_project_passing — ISSUE

Finding 2: Missing score assertion in test_go_project_failing — ISSUE

Finding 3: Missing score assertion in test_node_typecheck_errors — ISSUE

Finding 4: Missing score assertion in test_node_lint_errors — ISSUE

Finding 5: Missing score assertion in test_rust_lint_errors — ISSUE

Finding 6: Fragile detail string assertions — ISSUE

Finding 7: test_mixed_project_aggregates doesn't verify passed field — ISSUE

Branch hygiene

Uh oh!

osilkin98 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 1, 2026 •

edited

Loading

Finding 1: Weak assertion in `test_go_project_passing` — ISSUE

Finding 2: Missing score assertion in `test_go_project_failing` — ISSUE

Finding 3: Missing score assertion in `test_node_typecheck_errors` — ISSUE

Finding 4: Missing score assertion in `test_node_lint_errors` — ISSUE

Finding 5: Missing score assertion in `test_rust_lint_errors` — ISSUE

Finding 7: `test_mixed_project_aggregates` doesn't verify `passed` field — ISSUE