Use KEGG 118 HMM libraries in getKEGGModelForOrganism#642
Merged
Conversation
Point getKEGGModelForOrganism at the kegg118 pre-trained HMM sets published in the raven-toolbox v0.3.0 release. The recognised dataDir suffixes become euk90_kegg118 / prok90_kegg118, and the auto-download URL fetches kegg118_<domain>.hmm.gz from the v0.3.0 release (previously kegg116 from v0.1.0). Only the two kegg118 HMM sets (eukaryotes, prokaryotes) are supported; earlier KEGG releases are no longer offered for download. Update the tutorial5 example to use euk90_kegg118 accordingly.
Function test results202 tests 178 ✅ 33s ⏱️ Results for commit db21ef4. ♻️ This comment has been updated with latest results. |
getKEGGModelForOrganism recognised dataDir suffixes euk90_kegg118 / prok90_kegg118 but built the download URL from a second, index-aligned array (eukaryotes / prokaryotes), and the published asset is named differently again (kegg118_eukaryotes). Three names for one artefact. Standardise on the published asset name: dataDir is now kegg118_eukaryotes / kegg118_prokaryotes, so the local directory, the local .hmm library, and the downloaded asset all share one name. The hmmDomains/hmmIndex parallel array is removed -- the matched dataDir suffix doubles as the download filename. Docstring and the tutorial5 example updated accordingly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switches
getKEGGModelForOrganismto the KEGG 118 pre-trained HMM sets published in the raven-toolboxv0.3.0release, and unifies the naming of those sets.KEGG 118 support
…/releases/download/v0.3.0/(previously kegg116 fromv0.1.0).Naming clean-up
Previously there were three names for one artefact: the recognised
dataDirsuffix (euk90_kegg118), an internal index-aligned array used only to build the URL (eukaryotes), and the published asset itself (kegg118_eukaryotes).dataDiris now styledkegg118_eukaryotes/kegg118_prokaryotes— identical to the published asset, so the local directory, the local.hmmlibrary, and the downloaded file all share one name.hmmDomains/hmmIndexparallel array is removed; the matcheddataDirsuffix doubles as the download filename.tutorial5example updated (the latter also moved off the long-staleeuk90_kegg105).This matches raven-toolbox, which already names the libraries
kegg<version>_<domain>.Notes
v0.3.0release (companion PR: Prepare kegg118 KEGG release (v0.3.0) raven-toolbox#41).doc/HTML still shows the old strings; it is produced byinstallation/updateDocumentation.m(m2html) and should be regenerated by a maintainer with MATLAB rather than hand-edited.Testing
assumeFail(require a local KEGG dump and external aligners), so they were not exercised here. Changes are string/URL/control-flow updates verified by inspection.