Skip to content

[megatron] compat megatron 0.17#9354

Merged
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:compat_megatron_017
May 15, 2026
Merged

[megatron] compat megatron 0.17#9354
Jintao-Huang merged 2 commits into
modelscope:mainfrom
Jintao-Huang:compat_megatron_017

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces compatibility for Megatron Core version 0.17.0rc0 and above by conditionally selecting checkpointing strategies. It also refactors imports, moving parallel_state to the top level and utilizing local imports for version-specific strategy functions. Feedback suggests making the version check more robust by using getattr to avoid potential AttributeError and refactoring the duplicated strategy selection logic into a helper function to improve maintainability.

@@ -34,11 +35,11 @@

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Accessing megatron.core.__version__ directly might raise an AttributeError if the attribute is missing in certain environments or older versions of Megatron Core. It is safer to use getattr with a default value to ensure the version check is robust.

Suggested change
mcore_017 = version.parse(getattr(megatron.core, '__version__', '0.0.0')) >= version.parse('0.17.0rc0')

Comment on lines 247 to +252
optim_sd_kwargs={'metadata': sharded_sd_metadata},
)
_filter_adapter_state_dict(state_dict, peft_format)

save_strategy = get_default_save_sharded_strategy()
if mcore_017:
save_strategy = TorchDistSaveShardedStrategy()
else:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The conditional logic for selecting the checkpointing strategy is duplicated in both save_mcore_checkpoint and load_mcore_checkpoint. Additionally, performing local imports inside these functions every time they are called is slightly inefficient, although acceptable for infrequent operations like checkpointing. Consider refactoring this into a helper function or using top-level imports with a try-except block to adhere better to PEP 8 and improve maintainability.

@Jintao-Huang Jintao-Huang merged commit 5b912f7 into modelscope:main May 15, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant