Skip to content

feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4)#18

Merged
ypriverol merged 3 commits intodevfrom
feat/logger-and-run-manifest
Apr 17, 2026
Merged

feat: -minSpectraPerThread flag, MSGFLogger, and run-manifest sidecar (Q9/Q11/Q4)#18
ypriverol merged 3 commits intodevfrom
feat/logger-and-run-manifest

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

Three items from the landscape review's quick-wins list (.claude/investigations/msgfplus_research_report.md), bundled together because each is small, and the later items consume the earlier ones.

Q9 — -minSpectraPerThread flag (issue MSGFPlus#52)

Replaces the hardcoded min(availableCores, specSize / 250) thread cap in MSGFPlus.runMSGFPlus / MSGFDB.runMSGFDB with an overridable CLI flag (default 250, minimum 1). On many-core hosts running small inputs (e.g. 20 cores, 1k spectra) users can lower the divisor to raise parallelism; everyone else gets identical behaviour.

Q11 — MSGFLogger infra

Lightweight static logger at edu.ucsd.msjava.misc.MSGFLogger with info / debug / warn / error levels. debug is gated on the existing -verbose 0/1 flag; warn / error go to stderr with [Warning] / [Error] prefixes. No external dependencies (no slf4j/log4j).

  • Migrates the top-level MSGFPlus.main + runMSGFPlus(ParamManager) loop to use the logger for error paths, the "Processing N spectra" summary, per-file banners, the completion footer, and decoy-ratio mismatch errors. The ~260 System.out.println call sites in DBScanner / ConcurrentMSGFPlus / BuildSA are unchanged — scope is deliberately narrow.
  • Default behaviour (-verbose 0) is unchanged for all non-debug messages.
  • New test: TestMSGFLogger (7 cases).

Q4 — RunManifestWriter sidecar

Writes <output.mzid>.manifest.json next to every mzIdentML output, capturing the run context so downstream pipelines (quantms, Galaxy-P, custom reanalysis) can reproduce or verify a search without re-parsing logs. Content:

  • MS-GF+ version, UTC timestamp
  • Java version / vendor, OS name / version / arch
  • -Xmx in MB, available processors, requested threads, task count, -minSpectraPerThread
  • Spec file absolute path + size + format; FASTA absolute path + size; output absolute path
  • Enzyme, activation method, instrument, protocol
  • Precursor tolerance (left / right), isotope error bounds, peptide length / charge bounds, missed-cleavage cap, matches per spec, MS-level range
  • Original CLI argv verbatim

JSON is hand-rolled (stable key order, UTF-8, 2-space indent) — no new dep in the shaded jar. Failures to write are MSGFLogger.warn-logged and never abort the search.

  • Touches: new RunManifestWriter.java + one call site in MSGFPlus.runMSGFPlus after each successful per-file search.
  • New test: TestRunManifestWriter (5 cases — required fields, echoed params, argv preservation, null-argv tolerance, disk write shape).

Test plan

  • mvn -B verify — 157 tests pass (was 141 on dev; +5 Q9, +7 Q11, +5 Q4; 57 skipped for external data).
  • Default -verbose 0 user sees unchanged console output.
  • -verbose 1 exposes the per-file enumeration and output/ignore details via MSGFLogger.debug.
  • Manifest written on a real search next to output.mzid as output.mzid.manifest.json.

Not in this PR

  • Q10 (-Xmx vs FASTA size pre-flight warning) — follow-up PR that consumes the MSGFLogger.warn introduced here.
  • Q8 (StaxMzMLParser error-prolog wrap) — follow-up PR; isolated to keep the 912-line file's review scope small.
  • Q6 (Pr(G|P) / Pr(G|O) columns) — own PR, touches scoring output.
  • Q7 (DirectPinWriter + -outputFormat pin) — own PR, 3–5 days.

🤖 Generated with Claude Code

ypriverol and others added 3 commits April 17, 2026 18:41
The thread-count cap in MSGFPlus.runMSGFPlus and MSGFDB.runMSGFDB
previously hardcoded "minimum 250 spectra per thread" (ui/MSGFPlus.java,
ui/MSGFDB.java). On many-core hosts running small inputs (e.g. 20 cores,
~1,000 spectra) this capped the search at ~4 threads, surprising users.

Rather than guess a new default, expose the divisor as -minSpectraPerThread
(default 250, min 1). Power users can lower it to raise parallelism on
small inputs; everyone else gets identical behaviour to before.

Wired in both MSGFPlus and the deprecated MSGFDB entry points so behaviour
stays consistent. Addresses issue MSGFPlus#52.

Tests: TestMinSpectraPerThread covers default, override, zero-rejection,
and MSGFDB registration. mvn -B verify: 145/145 tests pass, 57 skipped.

Docs: Troubleshooting.md and MSGFPlus.md now show the flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a lightweight static logger at edu.ucsd.msjava.misc.MSGFLogger with
info/debug/warn/error levels. Debug is gated on the existing -verbose 0/1
flag; warn/error go to stderr with [Warning]/[Error] prefixes. No external
dependencies (no slf4j/log4j) to keep the jar small.

Wires MSGFPlus.main() to call MSGFLogger.setVerbose(...) once after
parseParams, so the whole run inherits the CLI setting. Migrates the
top-level main() and the runMSGFPlus(ParamManager) dispatch loop:

  - Error paths: System.err.println("[Error] ...") -> MSGFLogger.error(...)
  - "Processing N spectra" (summary)          -> info
  - Per-file enumeration                      -> debug
  - Per-file "Processing"/"Ignoring" banner   -> info
  - Per-file "Writing results to"/"Output... exists" detail -> debug
  - "MS-GF+ complete" footer                  -> info
  - Decoy-ratio mismatch errors               -> MSGFLogger.error

Default behaviour (-verbose 0) is unchanged for all non-debug messages.
Running with -verbose 1 now exposes the per-file enumeration and the
per-file output/ignore details.

Intentionally narrow scope: the other ~260 System.out.println call sites
across DBScanner, ConcurrentMSGFPlus, BuildSA, etc. are unchanged. This
PR establishes the logger and wiring; case-by-case migration of those
sites can follow as they are touched.

Tests: TestMSGFLogger (7 tests) covers info-always, debug-gating, warn/
error stderr routing, format interpolation, and the isVerbose getter.
mvn -B verify: 152/152 tests pass, 57 skipped (same as before).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Writes <output.mzid>.manifest.json next to each mzIdentML output
capturing the run context: MS-GF+ version and timestamp; Java version,
vendor, OS; max heap and thread count; enzyme / instrument /
activation / protocol; precursor tolerance, isotope-error range,
length and charge bounds, missed-cleavage cap; spec-file and
FASTA-file absolute paths with byte sizes; and the original CLI argv
verbatim. Downstream pipelines (quantms, Galaxy-P, custom
reanalysis scripts) can then verify or reproduce a search without
re-parsing logs.

Called from MSGFPlus.runMSGFPlus after each successful per-file
search. Failures to write are MSGFLogger.warn()-logged and never
abort the search — manifests are advisory metadata, not output.

JSON is hand-rolled (stable key order, UTF-8, 2-space indent) so no
new dependency is pulled into the shaded jar.

Tests: TestRunManifestWriter covers required identity fields, echoed
SearchParams values, argv preservation, null-argv tolerance, and
end-to-end sidecar write/read.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e85a0a7-301f-4ea2-9c66-5e0995fbc749

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/logger-and-run-manifest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ypriverol ypriverol merged commit 1bd9ff2 into dev Apr 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant