Adjust converter to enable dataset processing even if errors occur#69
Open
yuliadub wants to merge 3 commits into
Open
Adjust converter to enable dataset processing even if errors occur#69yuliadub wants to merge 3 commits into
yuliadub wants to merge 3 commits into
Conversation
Collaborator
|
Per our discussion, I think we need to clean the MRD header too, so the file produced by the converter is internally consistent 😄 When slice count differs:
When channel count differs:
|
Collaborator
|
Also, the conda build failed because of an upstream issue with the I've already fixed it in #66, and John approved, so I'll merge it now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tested this on FastMRI knee and brain data, as well as Mosaic (new data source) data. Brain data and mosaic data was not converting correctly, below is the summary of why, both appear as artifacts of how the data we acquired was built.
Slice count (header 32 vs dataset 16, your brain file)
The fastMRI public brain release intentionally ships only a subset of the acquired slices (typically the diagnostically central ones) to control file size. The XML header is preserved verbatim from the original Siemens acquisition, which had the full 32. The fastMRI documentation/papers note this — the slice subsetting is a release-time decision, not a corruption. Trusting the dataset's first axis is the right call.
Axis order swap (Siemens twix file) — non-conforming layout.
The data is internally consistent (640 still matches matrix_size.x. What's broken is the storage convention — whoever repackaged the twix dump into the fastMRI HDF5 layout wrote phase-encode-first instead of readout-first. The samples themselves are intact, just transposed. So reconstruction will be correct as long as transpose happens on read, which is what the swapped branch does. Worth flagging upstream as a metadata/layout bug in the repackager, but the pixel data is fine.