Skip to content

mps-cli-py: complete binary (.mpb) persistency implementation#54

Merged
Prithvi686 merged 33 commits intomainfrom
feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency
Apr 15, 2026
Merged

mps-cli-py: complete binary (.mpb) persistency implementation#54
Prithvi686 merged 33 commits intomainfrom
feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency

Conversation

@Prithvi686
Copy link
Copy Markdown
Collaborator

@Prithvi686 Prithvi686 commented Feb 12, 2026

Added full binary (.mpb) model persistency support

MPS stores models in three formats: XML (.mps), file-per-root (.model directories), and binary ('.mpb'). The first two were already supported. This PR adds complete support for the binary format.

Newly Added:
SModelBuilderBinaryPersistency.py: Top-level .mpb parser that parses header, registry, model properties, node tree.

binary/registry.py: Registry section parser that populates 'index_2_concept', 'index_2_property', 'index_2_reference_role', 'index_2_child_role_in_parent', 'concept_id_2_concept'

binary/nodes.py: Node tree parser that containing methods 'read_children', 'read_node', '_read_reference', '_read_node_id'

binary/node_id_utils.py: NodeIdEncodingUtils class that encodes/decodes MPS node IDs between raw long and Base64-variant strings

Modified:
SSolutionBuilder.py: 'build_all()' now processes '.mpb' files in parallel via 'ProcessPoolExecutor' and also added 'USE_CACHE', 'CACHE_LOAD_FN', 'CACHE_SAVE_FN' hooks.

SSolutionsRepositoryBuilder.py: Three performance optimisations:

  • before extracting a jar, the zip central directory is peeked to check for any '.msd' file.
  • The remaining JARs (those with .msd) are extracted and parsed concurrently via a 'ThreadPoolExecutor' where JAR extraction is I/O-bound, so threads are the appropriate primitive here and no per-process spawn overhead.
  • Within each solution, .mpb files are parsed by worker processes thru 'ProcessPoolExecutor' rather than threads and parsing is cpu-bound (binary decoding, string table lookups), and the 'MPB_PARALLEL_THRESHOLD = 4' guard skips the pool for small batches and parses them serially and also solutions in a single JAR share one pool creation, so the cost is paid at most once per jar file.
  • After the first parse, each 'SModel' is saved to '~/.mps_cli_cache/' directory keyed by 'md5(path, time, size)' and on subsequent runs files whose path and modification time and size are unchanged are loaded from this directory and the cache is invalidated automatically when a file changes so no manual invalidation is needed but the tests always set 'USE_CACHE = false' to ensure fresh parses

demo.py: parses a plugins directory or test project, prints a structured summary of all solutions/models/nodes, runs a verification pass, and writes output to a timestamped log file

What is parsed and stored in SModel:
Every .mpb file becomes one 'SModel' containing:

  • 'uuid' which is Java-style uuid string 'r:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' matching 'UUID.toString()' in Java
  • 'name' which is a fully qualified model name (ex: 'jetbrains.mps.vcs.diff')
  • 'root_nodes' which is a list of 'SNode' objects, each carrying:
    • 'uuid': Base64-variant encoded node ID
    • 'concept': 'SConcept' with the fully qualified concept name resolved from the registry
    • 'role_in_parent': the containment link name this node fills in its parent
    • 'parent': back-reference to the containing 'SNode' (means none for root nodes)
    • 'properties': 'dict[str, str]' of all string-valued property name and value pairs
    • 'references': 'dict[str, SNodeRef]' mapping reference role name to 'SNodeRef', which carries 'model_uuid', 'node_uuid', and 'resolve_info'
    • 'children': ordered list of child 'SNode' objects (recursive)

Also, extended SSolutionsRepositoryBuilder to load SLanguage aspect models from .mpl files in jar files and added a new demo_language_extraction.py to print language's aspect information to a markdown file to verify if the aspects are correctly populated

9 new test files (~75 new test methods) have been added to verify all the parsing scenarios.

ratiud and others added 11 commits December 23, 2025 11:10
- Refactored binary persistency implementation to
  separate constants, low-level reader utilities.
- Fixed model header parsing to correctly handle
  model-reference kind vs model-id kind according
  to MPS binary persistency format.
- Correctly reconstruct model UUID with 'r:' prefix.
- Updated low-level test expectations to reflect fully-
  qualified model names.
… persistency and added low-level tests covering imports
… read_reference

- Integrated node loading into
  SModelBuilderBinaryPersistency
- Added root_nodes structure
- Extended tests to validate full model tree parsing
…rchitecture

- Removed registry dict usage
- Integrated index_2_* maps from base builder
- Construct real SConcept, SProperty, SNode instances
- Unified binary builder structure with XML persistency
- Updated tests to validate object-based model structure
- Implement full binary (.mpb) model parsing
- Load model header, registry, used languages and imports
- Build concept/property/reference/containment index maps
- Parse node tree including containment roles and properties
- Added support for reference kind validation and resolve_info
- Applied node id encoding during parsing
- Added repository-level completeness and resolution tests
@Prithvi686 Prithvi686 marked this pull request as draft February 12, 2026 08:49
…ode ids and also corrected existing test case failure
1) Fixed wrong field order issue in nodes.py to correctly parse node info in mpb file.

2) Corrected the parser to now correctly handle all the structural variants encountered across real plugin mpb files with V3 stream format (0x00000500) and the mpb files that use a DEPENDENCY_V1 byte.

3) Implemented complete binary persistency for real plugin mpb files but extensive py tests are still pending

4) Corrected a few issues with parsing model uuid and implemented logic to build models in parallel instead of one-by-one by using separate processes to get around Python's speed limits using concurrent.futures.ProcessPoolExecutor API.

5) Also, improved parser performance by implementing logic to only peek into jar files first to determine if msd files are present and only then extract the jars to parse mpb files. This significantly reduced the parse execution time from ~248 seconds to hardly ~9 seconds.
@Prithvi686 Prithvi686 force-pushed the feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency branch from 04d648d to 9efb3d5 Compare March 19, 2026 17:25
…rs, imports, node trees, libraries,references, registry entries, language extraction, registry parsing performance and fixed two failing tests and a few mini cleanups
…lderBinaryPersistency to correctly format uuid's
…iles in parallel is already handled by SSolutionBuilder
@Prithvi686 Prithvi686 marked this pull request as ready for review March 25, 2026 18:50
Prithvi686 and others added 4 commits March 30, 2026 13:35
…onality is already contained within node_id_utils.py and few minor clean ups
…_for_binary_persistency' of https://github.com/mbeddr/mps-cli into feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency
Prithvi686 and others added 5 commits April 8, 2026 15:35
…t models from .mpl files in jar files

- Added _jar_is_relevant() to accept jars containing either .msd or .mpl (language jars have no .msd so we previously filtered out silently)
- SLanguageBuilder.load_from_mpl() reads language namespace, uuid, and languageVersion from .mpl and populates the existing SLanguage registry entry
- SLanguageBuilder._load_aspect_models() parses all .mpb aspect files found in the models directory inside the jar and attaches them to the SLanguage
- SLanguage now has two new fields: language_version(int) and models (list)
- demo.py has now been updated to show language aspect model counts and sample concept names
- 1 new test file added (test_binary_language_from_mpl.py) using jetbrains.mps.build.tips-src.jar as test data under mps_cli_binary_persistency_language folder
…_for_binary_persistency' of https://github.com/mbeddr/mps-cli into feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency
Copy link
Copy Markdown
Member

@danielratiu danielratiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments which we need to address

Comment thread mps-cli-py/src/mpscli/model/builder/binary/constants.py Outdated
Comment thread mps-cli-py/src/mpscli/model/builder/NodeIdEncodingUtils.py Outdated
Comment thread mps-cli-py/src/mpscli/model/builder/binary/model_input_stream.py
Comment thread mps-cli-py/src/mpscli/model/builder/binary/model_input_stream.py
Comment thread mps-cli-py/src/mpscli/model/builder/binary/nodes.py
Comment thread mps-cli-py/tests/test_nodes.py
Comment thread mps-cli-py/tests/test_binary_references.py Outdated
Comment thread mps-cli-py/tests/test_binary_registry.py Outdated
Comment thread mps-cli-py/tests/binary/test_binary_repository.py
Comment thread mps-cli-py/tests/test_binary_repository_completeness.py Outdated
- Separated concerns: extracted ModelCache, MpbBatchParser
- Extended parametrized tests to cover .mpb format, removed weak assertions and merged completeness test into parametrized suite
- Added black formatter and pytest config to VS Code settings.json, extensions.json, and pyproject.toml files and updated .gitignore as well
…_for_binary_persistency' of https://github.com/mbeddr/mps-cli into feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency
@Prithvi686
Copy link
Copy Markdown
Collaborator Author

I have left some comments which we need to address

Hi @danielratiu ,

Addressed all the comments, thank you.

Copy link
Copy Markdown
Member

@danielratiu danielratiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also move the binary persistence specific tests into a subfolder under tests "tests/binary"

Comment thread mps-cli-py/src/mpscli/model/builder/SSolutionsRepositoryBuilder.py
Comment thread mps-cli-py/.vscode/settings.json
Copy link
Copy Markdown
Member

@danielratiu danielratiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this important extension

lgtm now

- moved jar_is_relevant logic to a separate utility
- moved all binary persistency related tests to a separate folder and also added a new readme explaining the same
@Prithvi686 Prithvi686 merged commit 1fab8db into main Apr 15, 2026
4 of 5 checks passed
@Prithvi686 Prithvi686 deleted the feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency branch April 15, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants