Skip to content

[NVVM] Support - Followup enhancements#1218

Open
abhilash1910 wants to merge 45 commits intoNVIDIA:mainfrom
abhilash1910:nvvm_enhance
Open

[NVVM] Support - Followup enhancements#1218
abhilash1910 wants to merge 45 commits intoNVIDIA:mainfrom
abhilash1910:nvvm_enhance

Conversation

@abhilash1910
Copy link
Contributor

Description

Issue Link - #981

Changes to be addressed in this WIP PR:

  • LTO IR testing
  • Is there a way to add multiple modules?
    {If / when it is possible to add multiple modules, a test with code that uses something from libdevice is probably a good idea.
    It's also useful to be able to lazily add a module}
  • apply bitcode pattern input for libnvvm

cc @leofang

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@abhilash1910 abhilash1910 marked this pull request as draft November 5, 2025 02:17
@leofang leofang added this to the cuda.core beta 9 milestone Nov 10, 2025
@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Nov 10, 2025
@leofang
Copy link
Member

leofang commented Nov 17, 2025

Thanks, @abhilash1910! Any ETA to wrap this up?

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@leofang leofang linked an issue Nov 25, 2025 that may be closed by this pull request
Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Abhilash! Leaving a few early feedbacks.

Comment on lines 538 to 546
bitcode_path = os.environ.get("BITCODE_NVVM_PATH")
if not bitcode_path:
pytest.skip("BITCODE_NVVM_PATH environment variable is not set.Disabling the test.")
bitcode_file = Path(bitcode_path)
if not bitcode_file.exists():
pytest.skip(f"Bitcode file not found: {bitcode_path}")

if bitcode_file.suffix != ".bc":
pytest.skip(f"Expected .bc file, got: {bitcode_file.suffix}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible for us to avoid having a file locally? We have bitcode in this repo already:

MINIMAL_NVVMIR_TXT_TEMPLATE = b"""\
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-i128:128:128-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
define void @kernel() {
entry:
ret void
}
!nvvm.annotations = !{!0}
!0 = !{void ()* @kernel, !"kernel", i32 1}
!nvvmir.version = !{!1}
!1 = !{i32 %d, i32 0, i32 %d, i32 0}
""" # noqa: E501
MINIMAL_NVVMIR_BITCODE_STATIC = {
(1, 3): # (major, debug_major)
"4243c0de3514000005000000620c30244a59be669dfbb4bf0b51804c01000000210c00007f010000"
"0b02210002000000160000000781239141c80449061032399201840c250508191e048b62800c4502"
"42920b42641032143808184b0a3232884870c421234412878c1041920264c808b1142043468820c9"
"01323284182a282a90317cb05c9120c3c8000000892000000b0000003222c80820624600212b2498"
"0c212524980c19270c85a4906032645c20246382a01801300128030173046000132677b00778a007"
"7cb0033a680377b0877420877408873618877a208770d8e012e5d006f0a0077640077a600774a007"
"7640076d900e71a00778a00778d006e980077a80077a80076d900e7160077a100776a0077160076d"
"900e7320077a300772a0077320076d900e7640077a600774a0077640076d900e71200778a0077120"
"0778a00771200778d006e6300772a0077320077a300772d006e6600774a0077640077a600774d006"
"f6100776a0077160077a100776d006f6300772a0077320077a300772d006f6600774a0077640077a"
"600774d006f610077280077a10077280077a10077280076de00e7160077a300772a0077640071a21"
"4c0e11de9c2e4fbbcfbe211560040000000000000000000000000620b141a0e86000004016080000"
"06000000321e980c19114c908c092647c6044362098c009401000000b1180000ac0000003308801c"
"c4e11c6614013d88433884c38c4280077978077398710ce6000fed100ef4800e330c421ec2c11dce"
"a11c6630053d88433884831bcc033dc8433d8c033dcc788c7470077b08077948877070077a700376"
"788770208719cc110eec900ee1300f6e300fe3f00ef0500e3310c41dde211cd8211dc2611e663089"
"3bbc833bd04339b4033cbc833c84033bccf0147660077b6807376887726807378087709087706007"
"76280776f8057678877780875f08877118877298877998812ceef00eeee00ef5c00eec300362c8a1"
"1ce4a11ccca11ce4a11cdc611cca211cc4811dca6106d6904339c84339984339c84339b8c3389443"
"3888033b94c32fbc833cfc823bd4033bb0c30cc7698770588772708374680778608774188774a087"
"19ce530fee000ff2500ee4900ee3400fe1200eec500e3320281ddcc11ec2411ed2211cdc811edce0"
"1ce4e11dea011e66185138b0433a9c833bcc50247660077b68073760877778077898514cf4900ff0"
"500e331e6a1eca611ce8211ddec11d7e011ee4a11ccc211df0610654858338ccc33bb0433dd04339"
"fcc23ce4433b88c33bb0c38cc50a877998877718877408077a28077298815ce3100eecc00ee5500e"
"f33023c1d2411ee4e117d8e11dde011e6648193bb0833db4831b84c3388c4339ccc33cb8c139c8c3"
"3bd4033ccc48b471080776600771088771588719dbc60eec600fede006f0200fe5300fe5200ff650"
"0e6e100ee3300ee5300ff3e006e9e00ee4500ef83023e2ec611cc2811dd8e117ec211de6211dc421"
"1dd8211de8211f66209d3bbc433db80339948339cc58bc7070077778077a08077a488777708719cb"
"e70eef300fe1e00ee9400fe9a00fe530c3010373a8077718875f988770708774a08774d087729881"
"844139e0c338b0433d904339cc40c4a01dcaa11de0411edec11c662463300ee1c00eec300fe9400f"
"e5000000792000001d000000721e482043880c19097232482023818c9191d144a01028643c313242"
"8e9021a318100a00060000006b65726e656c0000230802308240042308843082400c330c4230cc40"
"0c4441c84860821272b3b36b730973737ba30ba34b7b739b1b2528d271b3b36b4b9373b12b939b4b"
"7b731b2530000000a9180000250000000b0a7228877780077a587098433db8c338b04339d0c382e6"
"1cc6a10de8411ec2c11de6211de8211ddec11d1634e3600ee7500fe1200fe4400fe1200fe7500ef4"
"b08081077928877060077678877108077a28077258709cc338b4013ba4833d94c3026b1cd8211cdc"
"e11cdc201ce4611cdc201ce8811ec2611cd0a11cc8611cc2811dd861c1010ff4200fe1500ff4800e"
"00000000d11000000600000007cc3ca4833b9c033b94033da0833c94433890c30100000061200000"
"06000000130481860301000002000000075010cd14610000000000007120000003000000320e1022"
"8400fb020000000000000000650c00001f000000120394f000000000030000000600000006000000"
"4c000000010000005800000000000000580000000100000070000000000000000c00000013000000"
"1f000000080000000600000000000000700000000000000000000000010000000000000000000000"
"060000000000000006000000ffffffff00240000000000005d0c00000d0000001203946700000000"
"6b65726e656c31352e302e376e7670747836342d6e76696469612d637564613c737472696e673e00"
"00000000",
(2, 3): # (major, debug_major)
"4243c0de3514000005000000620c30244a59be669dfbb4bf0b51804c01000000210c000080010000"
"0b02210002000000160000000781239141c80449061032399201840c250508191e048b62800c4502"
"42920b42641032143808184b0a3232884870c421234412878c1041920264c808b1142043468820c9"
"01323284182a282a90317cb05c9120c3c8000000892000000b0000003222c80820624600212b2498"
"0c212524980c19270c85a4906032645c20246382a01801300128030173046000132677b00778a007"
"7cb0033a680377b0877420877408873618877a208770d8e012e5d006f0a0077640077a600774a007"
"7640076d900e71a00778a00778d006e980077a80077a80076d900e7160077a100776a0077160076d"
"900e7320077a300772a0077320076d900e7640077a600774a0077640076d900e71200778a0077120"
"0778a00771200778d006e6300772a0077320077a300772d006e6600774a0077640077a600774d006"
"f6100776a0077160077a100776d006f6300772a0077320077a300772d006f6600774a0077640077a"
"600774d006f610077280077a10077280077a10077280076de00e7160077a300772a0077640071a21"
"4c0e11de9c2e4fbbcfbe211560040000000000000000000000000620b141a0286100004016080000"
"06000000321e980c19114c908c092647c60443620914c10840190000b1180000ac0000003308801c"
"c4e11c6614013d88433884c38c4280077978077398710ce6000fed100ef4800e330c421ec2c11dce"
"a11c6630053d88433884831bcc033dc8433d8c033dcc788c7470077b08077948877070077a700376"
"788770208719cc110eec900ee1300f6e300fe3f00ef0500e3310c41dde211cd8211dc2611e663089"
"3bbc833bd04339b4033cbc833c84033bccf0147660077b6807376887726807378087709087706007"
"76280776f8057678877780875f08877118877298877998812ceef00eeee00ef5c00eec300362c8a1"
"1ce4a11ccca11ce4a11cdc611cca211cc4811dca6106d6904339c84339984339c84339b8c3389443"
"3888033b94c32fbc833cfc823bd4033bb0c30cc7698770588772708374680778608774188774a087"
"19ce530fee000ff2500ee4900ee3400fe1200eec500e3320281ddcc11ec2411ed2211cdc811edce0"
"1ce4e11dea011e66185138b0433a9c833bcc50247660077b68073760877778077898514cf4900ff0"
"500e331e6a1eca611ce8211ddec11d7e011ee4a11ccc211df0610654858338ccc33bb0433dd04339"
"fcc23ce4433b88c33bb0c38cc50a877998877718877408077a28077298815ce3100eecc00ee5500e"
"f33023c1d2411ee4e117d8e11dde011e6648193bb0833db4831b84c3388c4339ccc33cb8c139c8c3"
"3bd4033ccc48b471080776600771088771588719dbc60eec600fede006f0200fe5300fe5200ff650"
"0e6e100ee3300ee5300ff3e006e9e00ee4500ef83023e2ec611cc2811dd8e117ec211de6211dc421"
"1dd8211de8211f66209d3bbc433db80339948339cc58bc7070077778077a08077a488777708719cb"
"e70eef300fe1e00ee9400fe9a00fe530c3010373a8077718875f988770708774a08774d087729881"
"844139e0c338b0433d904339cc40c4a01dcaa11de0411edec11c662463300ee1c00eec300fe9400f"
"e5000000792000001e000000721e482043880c19097232482023818c9191d144a01028643c313242"
"8e9021a318100a00060000006b65726e656c0000230802308240042308843082400c23080431c320"
"04c30c045118858c04262821373bbb36973037b737ba30bab437b7b95102231d373bbbb6343917bb"
"32b9b9b437b7518203000000a9180000250000000b0a7228877780077a587098433db8c338b04339"
"d0c382e61cc6a10de8411ec2c11de6211de8211ddec11d1634e3600ee7500fe1200fe4400fe1200f"
"e7500ef4b08081077928877060077678877108077a28077258709cc338b4013ba4833d94c3026b1c"
"d8211cdce11cdc201ce4611cdc201ce8811ec2611cd0a11cc8611cc2811dd861c1010ff4200fe150"
"0ff4800e00000000d11000000600000007cc3ca4833b9c033b94033da0833c94433890c301000000"
"6120000006000000130481860301000002000000075010cd14610000000000007120000003000000"
"320e10228400fc020000000000000000650c00001f000000120394f0000000000300000006000000"
"060000004c000000010000005800000000000000580000000100000070000000000000000c000000"
"130000001f0000000800000006000000000000007000000000000000000000000100000000000000"
"00000000060000000000000006000000ffffffff00240000000000005d0c00000d00000012039467"
"000000006b65726e656c31352e302e376e7670747836342d6e76696469612d637564613c73747269"
"6e673e0000000000",
}
@pytest.fixture(params=("txt", "bitcode_static"))
def minimal_nvvmir(request):
major, minor, debug_major, debug_minor = nvvm.ir_version()
if request.param == "txt":
return MINIMAL_NVVMIR_TXT_TEMPLATE % (major, debug_major)
bitcode_static_binascii = MINIMAL_NVVMIR_BITCODE_STATIC.get((major, debug_major))
if bitcode_static_binascii:
return binascii.unhexlify(bitcode_static_binascii)
raise RuntimeError(
"Static bitcode for NVVM IR version "
f"{major}.{debug_major} is not available in this test.\n"
"Maintainers: Please run the helper script to generate it and add the "
"output to the MINIMAL_NVVMIR_BITCODE_STATIC dict:\n"
" ../../toolshed/build_static_bitcode_input.py"
)

so I suggest that we move it to the common place, say a new file under cuda_python_test_helpers:
https://github.com/NVIDIA/cuda-python/tree/main/cuda_python_test_helpers/cuda_python_test_helpers
and have it imported in both cuda.bindings/core tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be partially addressed now as I have yet to remove the existing invocations from cuda_bindings test

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@brandon-b-miller
Copy link
Contributor

hi @abhilash1910 , is there any way I can help with this PR? The code that finds libdevice can be useful for numba-cuda, so I'm happy to assist in getting this over the line if you're comfortable with that. I could start with some tests for the pathfinding logic if that makes sense.

@abhilash1910
Copy link
Contributor Author

Yes @brandon-b-miller thanks . I am in process of rebasing this PR as it is a bit outdated, and I would definitely require more reviews and tests for this.

@brandon-b-miller
Copy link
Contributor

hi @abhilash1910, @rwgk, what do you think of this patch as a basis for some testing? This uses a temporary dir and creates the expected directory structure for each discovery method within it. The wheel test patches out the base site-packages dir that's expected, and the conda and CUDA_HOME methods control what's visible through either the $CONDA_PREFIX or $CUDA_HOME env vars respectively.

Details
diff --git a/cuda_pathfinder/tests/test_find_libdevice.py b/cuda_pathfinder/tests/test_find_libdevice.py
new file mode 100644
index 000000000..2d24f397f
--- /dev/null
+++ b/cuda_pathfinder/tests/test_find_libdevice.py
@@ -0,0 +1,94 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+import os
+
+import pytest
+
+from cuda.pathfinder import find_libdevice
+from cuda.pathfinder._dynamic_libs import find_libdevice as find_libdevice_module
+
+FILENAME = "libdevice.10.bc"
+
+SITE_PACKAGES_REL_DIR_CUDA12 = "nvidia/cuda_nvcc/nvvm/libdevice"
+SITE_PACKAGES_REL_DIR_CUDA13 = "nvidia/cuda_nvvm/nvvm/libdevice"
+
+
+@pytest.fixture
+def clear_find_libdevice_cache():
+    find_libdevice.cache_clear()
+    yield
+    find_libdevice.cache_clear()
+
+
+def _make_libdevice_file(dir_path: str) -> str:
+    os.makedirs(dir_path, exist_ok=True)
+    file_path = os.path.join(dir_path, FILENAME)
+    with open(file_path, "wb"):
+        pass
+    return file_path
+
+
+@pytest.mark.parametrize("rel_dir", [SITE_PACKAGES_REL_DIR_CUDA12, SITE_PACKAGES_REL_DIR_CUDA13])
+@pytest.mark.usefixtures("clear_find_libdevice_cache")
+def test_find_libdevice_via_site_packages(monkeypatch, mocker, tmp_path, rel_dir):
+    libdevice_dir = tmp_path.joinpath(*rel_dir.split("/"))
+    expected_path = str(_make_libdevice_file(str(libdevice_dir)))
+
+    mocker.patch.object(
+        find_libdevice_module,
+        "find_sub_dirs_all_sitepackages",
+        return_value=[str(libdevice_dir)],
+    )
+    monkeypatch.delenv("CONDA_PREFIX", raising=False)
+    monkeypatch.delenv("CUDA_HOME", raising=False)
+    monkeypatch.delenv("CUDA_PATH", raising=False)
+
+    result = find_libdevice()
+
+    assert result == expected_path
+    assert os.path.isfile(result)
+
+
+# same for cu12/cu13
+@pytest.mark.usefixtures("clear_find_libdevice_cache")
+def test_find_libdevice_via_conda(monkeypatch, mocker, tmp_path):
+    rel_path = os.path.join("nvvm", "libdevice")
+    libdevice_dir = tmp_path / rel_path
+    expected_path = str(_make_libdevice_file(str(libdevice_dir)))
+
+    mocker.patch.object(find_libdevice_module, "IS_WINDOWS", False)
+    mocker.patch.object(
+        find_libdevice_module,
+        "find_sub_dirs_all_sitepackages",
+        return_value=[],
+    )
+    monkeypatch.setenv("CONDA_PREFIX", str(tmp_path))
+    monkeypatch.delenv("CUDA_HOME", raising=False)
+    monkeypatch.delenv("CUDA_PATH", raising=False)
+
+    result = find_libdevice()
+
+    assert result == expected_path
+    assert os.path.isfile(result)
+
+
+@pytest.mark.usefixtures("clear_find_libdevice_cache")
+def test_find_libdevice_via_cuda_home(monkeypatch, mocker, tmp_path):
+    rel_path = os.path.join("nvvm", "libdevice")
+    libdevice_dir = tmp_path / rel_path
+    expected_path = str(_make_libdevice_file(str(libdevice_dir)))
+
+    mocker.patch.object(
+        find_libdevice_module,
+        "find_sub_dirs_all_sitepackages",
+        return_value=[],
+    )
+    monkeypatch.delenv("CONDA_PREFIX", raising=False)
+    monkeypatch.setenv("CUDA_HOME", str(tmp_path))
+    monkeypatch.delenv("CUDA_PATH", raising=False)
+
+    result = find_libdevice()
+
+    assert result == expected_path
+    assert os.path.isfile(result)

@rwgk
Copy link
Collaborator

rwgk commented Feb 10, 2026

hi @abhilash1910, @rwgk, what do you think of this patch as a basis for some testing? This uses a temporary dir and creates the expected directory structure for each discovery method within it. The wheel test patches out the base site-packages dir that's expected, and the conda and CUDA_HOME methods control what's visible through either the $CONDA_PREFIX or $CUDA_HOME env vars respectively.

This looks good to me. I convinced myself that the suggested code is pytest-xdist compatible (see below).

@abhilash1910 I believe @brandon-b-miller cannot push to this PR unless you give him push permission to this branch in your fork, but we could clone the commits here to a new branch/PR and develop the find_libdevice code & tests there. What do you think will be best?


Cursor analysis:

  • monkeypatch is function‑scoped and restores os.environ at teardown, so within a worker process there’s no leakage between tests.
  • xdist runs tests in separate worker processes, so env mutations are isolated per worker.
  • The only real shared state is the @functools.cache on find_libdevice, and the fixture clear_find_libdevice_cache already clears it before/after each test, which prevents order dependence.

So yes, those env edits are xdist‑compatible as written.

@abhilash1910
Copy link
Contributor Author

Yes @brandon-b-miller let me know if you received the invite. The change looks good to me as well.

@brandon-b-miller
Copy link
Contributor

Thanks @abhilash1910 I hope you don't mind I've pushed a few commits here hopefully addressing some of the remaining reviews. Thanks again for this implementation!

@rwgk could you take another look when you get a chance?

@abhilash1910
Copy link
Contributor Author

Thanks @brandon-b-miller for the help :)

@abhilash1910 abhilash1910 marked this pull request as ready for review February 11, 2026 17:42
@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

@abhilash1910
Copy link
Contributor Author

@rwgk @leofang @brandon-b-miller @kkraus14 requesting review. Thanks

@abhilash1910
Copy link
Contributor Author

pre-commit.ci autofix

Copy link
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues with the cuda.core part. We'd need more thoughts on the pathfinder, though.

Comment on lines +500 to +528
@nvvm_available
@pytest.mark.parametrize(
"options",
[
ProgramOptions(name="ltoir_test1", arch="sm_90", device_code_optimize=False),
ProgramOptions(name="ltoir_test2", arch="sm_100", link_time_optimization=True),
ProgramOptions(
name="ltoir_test3",
arch="sm_90",
ftz=True,
prec_sqrt=False,
prec_div=False,
fma=True,
device_code_optimize=True,
link_time_optimization=True,
),
],
)
def test_nvvm_program_options_ltoir(init_cuda, nvvm_ir, options):
"""Test NVVM programs for LTOIR with different options"""
program = Program(nvvm_ir, "nvvm", options)
assert program.backend == "NVVM"

ltoir_code = program.compile("ltoir")
assert isinstance(ltoir_code, ObjectCode)
assert ltoir_code.name == options.name
code_content = ltoir_code.code
assert len(code_content) > 0
program.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Can this test be combined with the one above (test_nvvm_program_options) and parametrized over target=("ptx", "ltoir")?

Comment on lines +583 to +611
# Add extra modules if provided
if options.extra_sources is not None:
if not is_sequence(options.extra_sources):
raise TypeError(
"extra_sources must be a sequence of 2-tuples: ((name1, source1), (name2, source2), ...)"
)
for i, module in enumerate(options.extra_sources):
if not isinstance(module, tuple) or len(module) != 2:
raise TypeError(
f"Each extra module must be a 2-tuple (name, source)"
f", got {type(module).__name__} at index {i}"
)

module_name, module_source = module

if not isinstance(module_name, str):
raise TypeError(f"Module name at index {i} must be a string, got {type(module_name).__name__}")

if isinstance(module_source, str):
# Textual LLVM IR - encode to UTF-8 bytes
module_source = module_source.encode("utf-8")
elif not isinstance(module_source, (bytes, bytearray)):
raise TypeError(
f"Module source at index {i} must be str (textual LLVM IR), bytes (textual LLVM IR or bitcode), "
f"or bytearray, got {type(module_source).__name__}"
)

if len(module_source) == 0:
raise ValueError(f"Module source for '{module_name}' (index {i}) cannot be empty")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, we can handle it in the next PR. The option validation should be moved to under ProgramOptions.


# Site-package paths for libdevice (following SITE_PACKAGES_LIBDIRS pattern)
SITE_PACKAGES_LIBDEVICE_DIRS = (
"nvidia/cuda_nvvm/nvvm/libdevice", # CTK 13+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subdir called cuda_nvvm? Really?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I see (using the find command under git-bash):

Cu12TestVenv/Lib/site-packages/nvidia/cuda_nvcc/nvvm/libdevice/libdevice.10.bc
Cu13TestVenv/Lib/site-packages/nvidia/cu13/nvvm/libdevice/libdevice.10.bc

I.e. this line should be "nvidia/cu13/nvvm/libdevice", but the line below is correct. (I looked very carefully.)

Comment on lines +22 to +27
from cuda.pathfinder._static_libs.find_libdevice import (
find_libdevice as find_libdevice,
)
from cuda.pathfinder._static_libs.find_libdevice import (
get_libdevice_path as get_libdevice_path,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel libdevice is not special enough to have its own functions. How about we consolidate these with find_nvidia_binary_utility? @rwgk WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, .bc (bitcode library) seems conceptually very different than e.g. nvcc (executable).

I think cuda/pathfinder/_static_libs is a better fit, compared to cuda/pathfinder/_binaries.

For the API, to mirror what we have for headers, how about:

  • locate_bitcode_lib("device") to get something similar to LocatedHeaderDir
  • find_bitcode_lib("device") to get just the abs_path

For other static libs there could be locate_static_lib("cudart").

So we'd be lumping the locate_bitcode_lib and locate_static_lib implementations under _static_lib, but that'd be a hidden implementation detail.

Comment on lines +18 to +22
FILENAME = "libdevice.10.bc"
if IS_WINDOWS:
bases = [r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA", r"C:\CUDA"]
else:
bases = ["/usr/local/cuda", "/opt/cuda"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nerve wrecking...

return abs_path


def get_libdevice_path() -> str | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, having both get_libdevice_path and find_libdevice is confusing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NVVM support - follow-up

4 participants