Arm backend: Split MLSDK setup flows and handle shaderFloat64 workaround by perheld · Pull Request #18942 · pytorch/executorch

perheld · 2026-04-16T12:31:26Z

Keep examples/arm/setup.sh focused on the pip-installed MLSDK flow and move source builds into the dedicated setup-mlsdk-from-source.sh script. Retain the old MLSDK source-build-related setup.sh options as deprecated compatibility shims that guide users to the dedicated source-build flow.

When the Vulkan ICD reports shaderFloat64 = false, the pip-installed emulation layer is no longer accepted for that setup because it cannot be configured with the required workaround. In that case, setup.sh now points users to the source-build script, and the source-build script builds the emulation layer with VMEL_USE_FLOAT_AS_DOUBLE=ON.

Also simplify pip-based emulation-layer environment setup by deriving paths from the installed package layout instead of scraping shell export output from the emulation_layer helper.

Update the Arm README to document the split between the default pip workflow and the advanced source-build workflow.

Change-Id: I4bebe171ee3b3728fb126c9d00d73881cb332253

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

pytorch-bot · 2026-04-16T12:31:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18942

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[CI[B200] Smoke test encounters CUDA Unknown error for dgxb200-03 and dgxb200-04

❌ 18 New Failures, 3 Unrelated Failures

As of commit 27991ae with merge base a43675c ():

NEW FAILURES - The following jobs have failed:

pull / test-multimodal-linux (gemma3-4b) / linux-job (gh)
RuntimeError: Command docker exec -t b6a22d979edbd5ae5260cf488e926848754a4639903fc90b5eebe1fb87128ac7 /exec failed with exit code 139
pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
pull / unittest-editable / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-llama-runner-mac (fp32, coreml) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-llama-runner-mac (fp32, mps) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-llama-runner-mac (fp32, xnnpack+custom+quantize_kv) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (emformer_join, xnnpack-quantization-delegation) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (ic4, xnnpack-quantization-delegation) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (llama, portable) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (mobilebert, xnnpack-quantization-delegation) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (resnet50, xnnpack-quantization-delegation) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (vit, xnnpack-quantization-delegation) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-cpu (w2l, xnnpack-quantization-delegation) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-models-macos-mps / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-selective-build-macos (cmake) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (lfm2_5_1_2b, linux.arm64.2xlarge, executorch-ubuntu-22.04-g... / linux-job (gh)
RuntimeError: Command docker exec -t eeac65ace150f4de0bbb91b85345ce78021922869042b88b6e20c0689d4ebeae /exec failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (phi_4_mini, linux.arm64.2xlarge, executorch-ubuntu-22.04-gc... / linux-job (gh)
RuntimeError: Command docker exec -t b2a30ac361b226737ee09a9a531f44678d938c4470b6d6dfa2ec985c1db24d87 /exec failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (qwen3_4b, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc1... / linux-job (gh)
RuntimeError: Command docker exec -t e272d20a008ff972870cfa01db5d1419af887fc6c79551d1fd72ab0d9e122693 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
trunk / unittest-release / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Keep examples/arm/setup.sh focused on the pip-installed MLSDK flow and move source builds into the dedicated setup-mlsdk-from-source.sh script. Retain the old MLSDK source-build-related setup.sh options as deprecated compatibility shims that guide users to the dedicated source-build flow. When the Vulkan ICD reports shaderFloat64 = false, the pip-installed emulation layer is no longer accepted for that setup because it cannot be configured with the required workaround. In that case, setup.sh now points users to the source-build script, and the source-build script builds the emulation layer with VMEL_USE_FLOAT_AS_DOUBLE=ON. Also simplify pip-based emulation-layer environment setup by deriving paths from the installed package layout instead of scraping shell export output from the emulation_layer helper. Update the Arm README to document the split between the default pip workflow and the advanced source-build workflow. Change-Id: I4bebe171ee3b3728fb126c9d00d73881cb332253 Signed-off-by: Per Held <per.held@arm.com>

Copilot

Pull request overview

This PR restructures the Arm backend setup flows by making examples/arm/setup.sh focus on the default pip-installed MLSDK path while moving MLSDK source builds (including the shaderFloat64 workaround build flag) into a new dedicated setup-mlsdk-from-source.sh script.

Changes:

Split MLSDK setup into a default pip workflow (examples/arm/setup.sh) and a dedicated source-build workflow (backends/arm/scripts/setup-mlsdk-from-source.sh), with deprecated shims in setup.sh.
Add a shaderFloat64 compatibility probe for the pip flow and enable VMEL_USE_FLOAT_AS_DOUBLE when building the emulation layer from source.
Simplify pip emulation-layer env setup by deriving paths from the installed Python package layout; update Arm README to document the split.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
examples/arm/setup.sh	Removes source-build flow, adds compatibility checks and deprecated option handling for MLSDK setup via pip.
backends/arm/scripts/vulkan_utils.sh	Removes unused MLSDK manifest dir variable from Vulkan utilities.
backends/arm/scripts/setup-mlsdk-from-source.sh	New script to build MLSDK components from source and apply the shaderFloat64 workaround via CMake flag.
backends/arm/scripts/mlsdk_utils.sh	Drops source-build helpers and adds vulkaninfo probing + pip package layout–based emulation layer environment setup.
backends/arm/README.md	Documents the default pip workflow and the advanced source-build workflow entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T12:36:14Z

+    setup_vulkan_sdk
+fi
+
+setup_mlsdk_source "${mlsdk_manifest_dir}"


setup_mlsdk_source is invoked unconditionally, so the script will sync the MLSDK manifest even when the user only requests --enable-vulkan-sdk (or disables all MLSDK components). Consider guarding this call so the manifest sync/build only runs when at least one of model-converter/vgf-lib/emulation-layer is enabled.

Suggested change

setup_mlsdk_source "${mlsdk_manifest_dir}"

if [[ "${enable_model_converter}" -eq 1 || "${enable_vgf_lib}" -eq 1 || "${enable_emulation_layer}" -eq 1 ]]; then

setup_mlsdk_source "${mlsdk_manifest_dir}"

fi

Copilot · 2026-04-16T12:36:14Z

+    if [[ ! -f repo ]]; then
+        log_step "mlsdk" "Fetching repo tool"
+        curl https://storage.googleapis.com/git-repo-downloads/repo > repo
+        chmod u+x repo
+    fi
+
+    ./repo init \
+        --depth=1 \
+        --no-repo-verify \
+        --manifest-url "${mlsdk_manifest_url}" \
+        --manifest-branch "${mlsdk_manifest_tag}" \
+        -g model-converter,emulation-layer,vgf-library
+
+    ./repo sync --force-sync -j"${parallel_jobs}"


The script downloads the repo tool over the network without integrity verification and then runs repo init with --no-repo-verify, which disables signature checks. This is a supply-chain risk for anyone running the setup script; consider pinning and verifying a checksum/signature for the downloaded repo binary and avoiding --no-repo-verify unless there’s a documented, unavoidable reason.

Copilot · 2026-04-16T12:36:15Z

+    if [[ "${cached_tag}" == "refs/tags/v2025.10.0" && \
+          "${mlsdk_manifest_tag}" == "refs/tags/v2025.12.0" ]]; then
+        pushd "${manifest_dir}/.." >/dev/null || return 1
+        log_step "mlsdk" "Deleting ${mlsdk_manifest_dir} and starting fresh"


In the v2025.10.0 -> v2025.12.0 migration workaround, the log message says it is deleting ${mlsdk_manifest_dir}, but the code actually deletes $(basename "${manifest_dir}") under $(dirname "${manifest_dir}"). If the user passed a custom --manifest-dir, this message can be inaccurate; consider logging the actual manifest_dir (or its basename) being removed.

Suggested change

log_step "mlsdk" "Deleting ${mlsdk_manifest_dir} and starting fresh"

log_step "mlsdk" "Deleting $(basename "${manifest_dir}") and starting fresh"

Copilot · 2026-04-16T12:36:15Z

+            --mlsdk-manifest-url)
+                if [[ $# -lt 2 ]]; then
+                    print_usage "$@"
+                    exit 1
+                fi
+                log_step "mlsdk" \
+                    "Deprecated option '--mlsdk-manifest-url' selected; use it with ./backends/arm/scripts/setup-mlsdk-from-source.sh instead"
+                shift 2
+                ;;
+            --mlsdk-manifest-tag)
+                if [[ $# -lt 2 ]]; then
+                    print_usage "$@"
+                    exit 1
+                fi
+                log_step "mlsdk" \
+                    "Deprecated option '--mlsdk-manifest-tag' selected; use it with ./backends/arm/scripts/setup-mlsdk-from-source.sh instead"
+                shift 2


The deprecated flags --mlsdk-manifest-url / --mlsdk-manifest-tag are accepted but then ignored (only a log + shift 2). This can silently fall back to the pip flow even though the user explicitly tried to influence the source-build manifest. Consider exiting non-zero (similar to --install-mlsdk-deps-from-src) after printing the guidance, so callers don’t think the flag took effect.

Copilot · 2026-04-16T12:36:15Z

+    echo "[mlsdk_utils] Failed to query emulation_layer environment; skipping"
 }


The fallback error message is now misleading: this code no longer queries the emulation_layer CLI output/environment, it locates the Python package directory and checks for a deploy/ folder. Update the message (and ideally write it to stderr) so failures are actionable (e.g., package not installed / no deploy dir found).

perheld requested a review from digantdesai as a code owner April 16, 2026 12:31

Copilot AI review requested due to automatic review settings April 16, 2026 12:31

perheld added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: arm Changes to the ARM backend delegate labels Apr 16, 2026

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2026

perheld force-pushed the ph-split-mlsdk branch from 0f46ba5 to 27991ae Compare April 16, 2026 12:31

Copilot started reviewing on behalf of perheld April 16, 2026 12:32 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

zingo approved these changes Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm backend: Split MLSDK setup flows and handle shaderFloat64 workaround#18942

Arm backend: Split MLSDK setup flows and handle shaderFloat64 workaround#18942
perheld wants to merge 1 commit intopytorch:mainfrom
perheld:ph-split-mlsdk

perheld commented Apr 16, 2026 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	log_step "mlsdk" "Deleting ${mlsdk_manifest_dir} and starting fresh"
	log_step "mlsdk" "Deleting $(basename "${manifest_dir}") and starting fresh"

		echo "[mlsdk_utils] Failed to query emulation_layer environment; skipping"
		}

Conversation

perheld commented Apr 16, 2026 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18942

❗ 1 Active SEVs

❌ 18 New Failures, 3 Unrelated Failures

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perheld commented Apr 16, 2026 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 16, 2026 •

edited

Loading