Fix issue in mul_mat_id for OpenVINO backend#163
Merged
zhaixuejun1993 merged 1 commit intoMay 15, 2026
Conversation
5f58d5d
into
ravi9:dev_backend_openvino
3 of 12 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refines the handling and shape management of expert weights and activations in the OpenVINO backend for the
MUL_MAT_IDoperation. The core improvements ensure that multi-expert weight tensors retain their full dimensionality, and that input/output tensor shapes are rebuilt more robustly, improving compatibility with dynamic or reshaped input graphs.Key changes include:
Shape Handling and Tensor Materialization:
ggml-decoder.cpp, when materializing non-quantized expert weights, the code now preserves the full reversed 4D shape, ensuring that the expert dimension is not collapsed. This prevents issues where later operations (like Gather/MatMul) would only see a single expert slice.Dynamic Shape Reconstruction and Reshaping:
mul_mat_id.cpp, the logic for squeezing singleton axes from weights, activations, and ids is replaced with explicit dynamic shape reconstruction usingShapeOfandReshape. This makes the code robust to input tensors that may have undergone reshaping or view operations, ensuring correct logical ranks regardless of input shape permutations.MatMulresult is now explicitly constructed using dynamic shape information and the expected output rank, replacing previous fixed unsqueeze/squeeze logic. This ensures that the output tensor always matches the required 4D shape, with checks for static rank and row dimension.General Improvements:
concat.hppto support new shape concatenation logic.## OverviewAdditional information
Requirements