[megatron] compat megatron 0.17#9354
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces compatibility for Megatron Core version 0.17.0rc0 and above by conditionally selecting checkpointing strategies. It also refactors imports, moving parallel_state to the top level and utilizing local imports for version-specific strategy functions. Feedback suggests making the version check more robust by using getattr to avoid potential AttributeError and refactoring the duplicated strategy selection logic into a helper function to improve maintainability.
| @@ -34,11 +35,11 @@ | |||
|
|
|||
There was a problem hiding this comment.
Accessing megatron.core.__version__ directly might raise an AttributeError if the attribute is missing in certain environments or older versions of Megatron Core. It is safer to use getattr with a default value to ensure the version check is robust.
| mcore_017 = version.parse(getattr(megatron.core, '__version__', '0.0.0')) >= version.parse('0.17.0rc0') |
| optim_sd_kwargs={'metadata': sharded_sd_metadata}, | ||
| ) | ||
| _filter_adapter_state_dict(state_dict, peft_format) | ||
|
|
||
| save_strategy = get_default_save_sharded_strategy() | ||
| if mcore_017: | ||
| save_strategy = TorchDistSaveShardedStrategy() | ||
| else: |
There was a problem hiding this comment.
The conditional logic for selecting the checkpointing strategy is duplicated in both save_mcore_checkpoint and load_mcore_checkpoint. Additionally, performing local imports inside these functions every time they are called is slightly inefficient, although acceptable for infrequent operations like checkpointing. Consider refactoring this into a helper function or using top-level imports with a try-except block to adhere better to PEP 8 and improve maintainability.
No description provided.