Skip to content

Use KEGG 118 HMM libraries in getKEGGModelForOrganism#642

Merged
edkerk merged 2 commits into
develop3from
kegg118-support
Jun 16, 2026
Merged

Use KEGG 118 HMM libraries in getKEGGModelForOrganism#642
edkerk merged 2 commits into
develop3from
kegg118-support

Conversation

@edkerk

@edkerk edkerk commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Switches getKEGGModelForOrganism to the KEGG 118 pre-trained HMM sets published in the raven-toolbox v0.3.0 release, and unifies the naming of those sets.

KEGG 118 support

  • The auto-download fetches the kegg118 HMM libraries from …/releases/download/v0.3.0/ (previously kegg116 from v0.1.0).
  • Only the two KEGG 118 sets — eukaryotes and prokaryotes — are supported; earlier KEGG releases are no longer offered for automatic download. No backwards-compatibility shims.

Naming clean-up

Previously there were three names for one artefact: the recognised dataDir suffix (euk90_kegg118), an internal index-aligned array used only to build the URL (eukaryotes), and the published asset itself (kegg118_eukaryotes).

  • dataDir is now styled kegg118_eukaryotes / kegg118_prokaryotes — identical to the published asset, so the local directory, the local .hmm library, and the downloaded file all share one name.
  • The hmmDomains / hmmIndex parallel array is removed; the matched dataDir suffix doubles as the download filename.
  • Docstring and the tutorial5 example updated (the latter also moved off the long-stale euk90_kegg105).

This matches raven-toolbox, which already names the libraries kegg<version>_<domain>.

Notes

  • URLs resolve once the kegg118 artefacts are attached to the raven-toolbox v0.3.0 release (companion PR: Prepare kegg118 KEGG release (v0.3.0) raven-toolbox#41).
  • The generated doc/ HTML still shows the old strings; it is produced by installation/updateDocumentation.m (m2html) and should be regenerated by a maintainer with MATLAB rather than hand-edited.

Testing

  • The repo's KEGG function tests are gated behind assumeFail (require a local KEGG dump and external aligners), so they were not exercised here. Changes are string/URL/control-flow updates verified by inspection.

Point getKEGGModelForOrganism at the kegg118 pre-trained HMM sets published in
the raven-toolbox v0.3.0 release. The recognised dataDir suffixes become
euk90_kegg118 / prok90_kegg118, and the auto-download URL fetches
kegg118_<domain>.hmm.gz from the v0.3.0 release (previously kegg116 from v0.1.0).

Only the two kegg118 HMM sets (eukaryotes, prokaryotes) are supported; earlier
KEGG releases are no longer offered for download. Update the tutorial5 example to
use euk90_kegg118 accordingly.
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

Function test results

202 tests   178 ✅  33s ⏱️
 22 suites   24 💤
  1 files      0 ❌

Results for commit db21ef4.

♻️ This comment has been updated with latest results.

getKEGGModelForOrganism recognised dataDir suffixes euk90_kegg118 / prok90_kegg118
but built the download URL from a second, index-aligned array (eukaryotes /
prokaryotes), and the published asset is named differently again
(kegg118_eukaryotes). Three names for one artefact.

Standardise on the published asset name: dataDir is now kegg118_eukaryotes /
kegg118_prokaryotes, so the local directory, the local .hmm library, and the
downloaded asset all share one name. The hmmDomains/hmmIndex parallel array is
removed -- the matched dataDir suffix doubles as the download filename. Docstring
and the tutorial5 example updated accordingly.
@edkerk edkerk merged commit aa5e2c6 into develop3 Jun 16, 2026
2 checks passed
@edkerk edkerk deleted the kegg118-support branch June 16, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant