Conversation
The thread-count cap in MSGFPlus.runMSGFPlus and MSGFDB.runMSGFDB previously hardcoded "minimum 250 spectra per thread" (ui/MSGFPlus.java, ui/MSGFDB.java). On many-core hosts running small inputs (e.g. 20 cores, ~1,000 spectra) this capped the search at ~4 threads, surprising users. Rather than guess a new default, expose the divisor as -minSpectraPerThread (default 250, min 1). Power users can lower it to raise parallelism on small inputs; everyone else gets identical behaviour to before. Wired in both MSGFPlus and the deprecated MSGFDB entry points so behaviour stays consistent. Addresses issue MSGFPlus#52. Tests: TestMinSpectraPerThread covers default, override, zero-rejection, and MSGFDB registration. mvn -B verify: 145/145 tests pass, 57 skipped. Docs: Troubleshooting.md and MSGFPlus.md now show the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a lightweight static logger at edu.ucsd.msjava.misc.MSGFLogger with
info/debug/warn/error levels. Debug is gated on the existing -verbose 0/1
flag; warn/error go to stderr with [Warning]/[Error] prefixes. No external
dependencies (no slf4j/log4j) to keep the jar small.
Wires MSGFPlus.main() to call MSGFLogger.setVerbose(...) once after
parseParams, so the whole run inherits the CLI setting. Migrates the
top-level main() and the runMSGFPlus(ParamManager) dispatch loop:
- Error paths: System.err.println("[Error] ...") -> MSGFLogger.error(...)
- "Processing N spectra" (summary) -> info
- Per-file enumeration -> debug
- Per-file "Processing"/"Ignoring" banner -> info
- Per-file "Writing results to"/"Output... exists" detail -> debug
- "MS-GF+ complete" footer -> info
- Decoy-ratio mismatch errors -> MSGFLogger.error
Default behaviour (-verbose 0) is unchanged for all non-debug messages.
Running with -verbose 1 now exposes the per-file enumeration and the
per-file output/ignore details.
Intentionally narrow scope: the other ~260 System.out.println call sites
across DBScanner, ConcurrentMSGFPlus, BuildSA, etc. are unchanged. This
PR establishes the logger and wiring; case-by-case migration of those
sites can follow as they are touched.
Tests: TestMSGFLogger (7 tests) covers info-always, debug-gating, warn/
error stderr routing, format interpolation, and the isVerbose getter.
mvn -B verify: 152/152 tests pass, 57 skipped (same as before).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Writes <output.mzid>.manifest.json next to each mzIdentML output capturing the run context: MS-GF+ version and timestamp; Java version, vendor, OS; max heap and thread count; enzyme / instrument / activation / protocol; precursor tolerance, isotope-error range, length and charge bounds, missed-cleavage cap; spec-file and FASTA-file absolute paths with byte sizes; and the original CLI argv verbatim. Downstream pipelines (quantms, Galaxy-P, custom reanalysis scripts) can then verify or reproduce a search without re-parsing logs. Called from MSGFPlus.runMSGFPlus after each successful per-file search. Failures to write are MSGFLogger.warn()-logged and never abort the search — manifests are advisory metadata, not output. JSON is hand-rolled (stable key order, UTF-8, 2-space indent) so no new dependency is pulled into the shaded jar. Tests: TestRunManifestWriter covers required identity fields, echoed SearchParams values, argv preservation, null-argv tolerance, and end-to-end sidecar write/read. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This was referenced Apr 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three items from the landscape review's quick-wins list (
.claude/investigations/msgfplus_research_report.md), bundled together because each is small, and the later items consume the earlier ones.Q9 —
-minSpectraPerThreadflag (issue MSGFPlus#52)Replaces the hardcoded
min(availableCores, specSize / 250)thread cap inMSGFPlus.runMSGFPlus/MSGFDB.runMSGFDBwith an overridable CLI flag (default 250, minimum 1). On many-core hosts running small inputs (e.g. 20 cores, 1k spectra) users can lower the divisor to raise parallelism; everyone else gets identical behaviour.SearchParams.java,ParamManager.java,MSGFPlus.java,MSGFDB.java, plusdocs/MSGFPlus.md+docs/Troubleshooting.md.TestMinSpectraPerThread(5 cases).Q11 —
MSGFLoggerinfraLightweight static logger at
edu.ucsd.msjava.misc.MSGFLoggerwithinfo/debug/warn/errorlevels.debugis gated on the existing-verbose 0/1flag;warn/errorgo tostderrwith[Warning]/[Error]prefixes. No external dependencies (no slf4j/log4j).MSGFPlus.main+runMSGFPlus(ParamManager)loop to use the logger for error paths, the "Processing N spectra" summary, per-file banners, the completion footer, and decoy-ratio mismatch errors. The ~260System.out.printlncall sites inDBScanner/ConcurrentMSGFPlus/BuildSAare unchanged — scope is deliberately narrow.-verbose 0) is unchanged for all non-debug messages.TestMSGFLogger(7 cases).Q4 —
RunManifestWritersidecarWrites
<output.mzid>.manifest.jsonnext to every mzIdentML output, capturing the run context so downstream pipelines (quantms, Galaxy-P, custom reanalysis) can reproduce or verify a search without re-parsing logs. Content:-Xmxin MB, available processors, requested threads, task count,-minSpectraPerThreadJSON is hand-rolled (stable key order, UTF-8, 2-space indent) — no new dep in the shaded jar. Failures to write are
MSGFLogger.warn-logged and never abort the search.RunManifestWriter.java+ one call site inMSGFPlus.runMSGFPlusafter each successful per-file search.TestRunManifestWriter(5 cases — required fields, echoed params, argv preservation, null-argv tolerance, disk write shape).Test plan
mvn -B verify— 157 tests pass (was 141 ondev; +5 Q9, +7 Q11, +5 Q4; 57 skipped for external data).-verbose 0user sees unchanged console output.-verbose 1exposes the per-file enumeration and output/ignore details viaMSGFLogger.debug.output.mzidasoutput.mzid.manifest.json.Not in this PR
-Xmxvs FASTA size pre-flight warning) — follow-up PR that consumes theMSGFLogger.warnintroduced here.StaxMzMLParsererror-prolog wrap) — follow-up PR; isolated to keep the 912-line file's review scope small.Pr(G|P)/Pr(G|O)columns) — own PR, touches scoring output.DirectPinWriter+-outputFormat pin) — own PR, 3–5 days.🤖 Generated with Claude Code