Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion language-extensions/python/src/PythonParam.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -345,8 +345,12 @@ void PythonStringParam<CharType>::RetrieveValueAndStrLenInd(bp::object mainNames
char *utf16str = PyBytes_AsString(PyUnicode_AsUTF16String(tempObj.ptr()));

// Reinterpret the bytes as wchar_t *, which we will return.
// The buffer contains UTF-16 code units in native byte order. The string always starts with a BOM mark.
// (https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUTF16String)
// We treat it as an array of 16-bit code units (CharType expected to be the size of wchar_t).
//
CharType *wData = reinterpret_cast<CharType *>(utf16str);
static_assert(sizeof(CharType) == sizeof(wchar_t), "CharType must match wchar_t size for UTF-16 reinterpretation.");
Comment on lines 345 to +352
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyUnicode_AsUTF16String returns a new reference; calling it inline inside PyBytes_AsString(...) leaks that PyObject* and also prevents any error handling. If PyUnicode_AsUTF16String fails (e.g., MemoryError) it returns nullptr, and PyBytes_AsString(nullptr) is undefined/crash. Store the returned bytes object in a temporary, validate it’s non-null (and that PyBytes_AsString succeeds), copy out the needed data, then Py_DECREF the temporary.

Copilot uses AI. Check for mistakes.
CharType *wData = reinterpret_cast<CharType *>(utf16str); // CodeQL [SM02986]: The buffer is properly aligned (divisible by 2), already contains real UTF-16 data (SQL NVARCHAR), and we know its exact length (not relying on null termination); so treating it as wchar_t* is safe.
Comment on lines +350 to +353
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_assert(sizeof(CharType) == sizeof(wchar_t)) is effectively a no-op here (this template is explicitly instantiated with CharType = wchar_t), so it doesn’t enforce the UTF-16 assumption that each code unit is 2 bytes. Since the code skips the BOM via wData + 1 and treats the buffer as UTF-16, the guard should instead assert the actual required invariant (e.g., sizeof(wchar_t) == 2 / sizeof(CharType) == 2, or CharType is exactly the intended 16-bit type).

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CodeQL suppression justification is factually shaky: “aligned (divisible by 2)” conflates buffer length with pointer alignment, and the safety argument hinges on details (Python allocator alignment, aliasing/type-punning rules) that aren’t captured here. Consider either (a) adjusting the comment to describe the real alignment guarantee you rely on, or (b) avoiding the reinterpret_cast entirely by copying/decoding the bytes into a vector<CharType> without type-punning, which would eliminate the need for the suppression.

Copilot uses AI. Check for mistakes.

// Ignore 2 byte BOM at front of wData that was added by PyUnicode_AsUTF16String
//
Expand Down
Loading