Fix CodeQL SM02986: char* to wchar_t* cast warning in PythonParam by monamaki · Pull Request #86 · microsoft/sql-server-language-extensions

monamaki · 2026-04-17T23:53:47Z

Summary

Addresses CodeQL alert [SM02986] (Cast from char* to wchar_t*) in PythonStringParam<wchar_t>::RetrieveValueAndStrLenInd by adding a suppression comment with justification and a compile-time safety guard.

Problem

CodeQL flags the reinterpret_cast<CharType *>(utf16str) in PythonParam.cpp as potentially unsafe — casting a byte string (char*) to a wide-character string (wchar_t*) can lead to alignment issues, incorrect termination, or buffer overruns.

Analysis

This cast is safe for the following reasons:

Valid content — PyUnicode_AsUTF16String produces valid UTF-16 code units in native byte order. The data is inherently wchar_t-compatible. (CPython docs)
Bounded length — The read length is determined by PyUnicode_GET_LENGTH (character count), not null termination, so there is no risk of buffer overrun.

Changes

Alignment: Added static_assert(sizeof(CharType) == sizeof(wchar_t), ...) to enforce at compile time that the template is only instantiated with a type matching wchar_t size.
Added an inline CodeQL suppression comment (// CodeQL [SM02986]: ...) with a justification.
Added a documentation comment block explaining why the buffer content is valid UTF-16, with a reference to the CPython docs.

Files Changed

language-extensions/python/src/PythonParam.cpp: comments + static_assert only; no functional change)

Testing

No functional code change: only comments and a compile-time assertion added.
Consumed the artifact of this code in the engine dev branch and tests passed. There are parameter data types test coverage that cover nvarchar for Python.

Risk

None — no runtime behavior change.

Copilot

Pull request overview

Updates PythonStringParam<wchar_t>::RetrieveValueAndStrLenInd to address a CodeQL warning around reinterpreting UTF-16 bytes as a wide-character pointer by adding explanatory comments, a CodeQL suppression, and a compile-time assertion.

Changes:

Added explanatory comments describing PyUnicode_AsUTF16String UTF-16/BOM behavior.
Added a static_assert intended as a compile-time safety guard.
Added an inline CodeQL suppression comment for SM02986 with justification.

Comments suppressed due to low confidence (1)

language-extensions/python/src/PythonParam.cpp:357

PyUnicode_GET_LENGTH returns the number of Unicode code points, not the number of UTF-16 code units. For non-BMP characters (surrogate pairs), size will be smaller than the number of 16-bit units in the PyUnicode_AsUTF16String buffer, so vector<CharType>(wData + 1, wData + 1 + size) will truncate/produce invalid UTF-16. If the intent is to copy the exact UTF-16 payload, compute the element count from the returned bytes length (minus BOM and the trailing UTF-16 NUL) rather than using PyUnicode_GET_LENGTH.

					CharType *wData = reinterpret_cast<CharType *>(utf16str); // CodeQL [SM02986]: The buffer is properly aligned (divisible by 2), already contains real UTF-16 data (SQL NVARCHAR), and we know its exact length (not relying on null termination); so treating it as wchar_t* is safe.

					// Ignore 2 byte BOM at front of wData that was added by PyUnicode_AsUTF16String
					//
					m_value = vector<CharType>(wData + 1, wData + 1 + size);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T23:50:57Z

+					// We treat it as an array of 16-bit code units (CharType expected to be the size of wchar_t).
 					//
-					CharType *wData = reinterpret_cast<CharType *>(utf16str);
+					static_assert(sizeof(CharType) == sizeof(wchar_t), "CharType must match wchar_t size for UTF-16 reinterpretation.");
+					CharType *wData = reinterpret_cast<CharType *>(utf16str); // CodeQL [SM02986]: The buffer is properly aligned (divisible by 2), already contains real UTF-16 data (SQL NVARCHAR), and we know its exact length (not relying on null termination); so treating it as wchar_t* is safe.


static_assert(sizeof(CharType) == sizeof(wchar_t)) is effectively a no-op here (this template is explicitly instantiated with CharType = wchar_t), so it doesn’t enforce the UTF-16 assumption that each code unit is 2 bytes. Since the code skips the BOM via wData + 1 and treats the buffer as UTF-16, the guard should instead assert the actual required invariant (e.g., sizeof(wchar_t) == 2 / sizeof(CharType) == 2, or CharType is exactly the intended 16-bit type).

Copilot · 2026-04-18T23:50:57Z

 					char *utf16str = PyBytes_AsString(PyUnicode_AsUTF16String(tempObj.ptr()));

 					// Reinterpret the bytes as wchar_t *, which we will return.
+					// The buffer contains UTF-16 code units in native byte order. The string always starts with a BOM mark.
+					// (https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF16String)
+					// We treat it as an array of 16-bit code units (CharType expected to be the size of wchar_t).
 					//
-					CharType *wData = reinterpret_cast<CharType *>(utf16str);
+					static_assert(sizeof(CharType) == sizeof(wchar_t), "CharType must match wchar_t size for UTF-16 reinterpretation.");


PyUnicode_AsUTF16String returns a new reference; calling it inline inside PyBytes_AsString(...) leaks that PyObject* and also prevents any error handling. If PyUnicode_AsUTF16String fails (e.g., MemoryError) it returns nullptr, and PyBytes_AsString(nullptr) is undefined/crash. Store the returned bytes object in a temporary, validate it’s non-null (and that PyBytes_AsString succeeds), copy out the needed data, then Py_DECREF the temporary.

Copilot · 2026-04-18T23:50:58Z

 					//
-					CharType *wData = reinterpret_cast<CharType *>(utf16str);
+					static_assert(sizeof(CharType) == sizeof(wchar_t), "CharType must match wchar_t size for UTF-16 reinterpretation.");
+					CharType *wData = reinterpret_cast<CharType *>(utf16str); // CodeQL [SM02986]: The buffer is properly aligned (divisible by 2), already contains real UTF-16 data (SQL NVARCHAR), and we know its exact length (not relying on null termination); so treating it as wchar_t* is safe.


The CodeQL suppression justification is factually shaky: “aligned (divisible by 2)” conflates buffer length with pointer alignment, and the safety argument hinges on details (Python allocator alignment, aliasing/type-punning rules) that aren’t captured here. Consider either (a) adjusting the comment to describe the real alignment guarantee you rely on, or (b) avoiding the reinterpret_cast entirely by copying/decoding the bytes into a vector<CharType> without type-punning, which would eliminate the need for the suppression.

codeql for cast

760b49f

monamaki changed the title ~~codeql for cast~~ Fix CodeQL SM02986: char* to wchar_t* cast warning in PythonParam Apr 18, 2026

monamaki marked this pull request as ready for review April 18, 2026 23:45

Copilot AI review requested due to automatic review settings April 18, 2026 23:45

Copilot started reviewing on behalf of monamaki April 18, 2026 23:46 View session

monamaki requested review from Aniruddh25, SicongLiu2000 and beccadaniel April 18, 2026 23:48

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CodeQL SM02986: char* to wchar_t* cast warning in PythonParam#86

Fix CodeQL SM02986: char* to wchar_t* cast warning in PythonParam#86
monamaki wants to merge 1 commit intomainfrom
dev/monamaki/codeQLFix100

monamaki commented Apr 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

monamaki commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Analysis

Changes

Files Changed

Testing

Risk

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

monamaki commented Apr 17, 2026 •

edited

Loading