Skip to content

fix: check codepoint count not byte length in deserialize_char#107

Merged
davidhewitt merged 1 commit intodavidhewitt:mainfrom
leftygbalogh:lefty/fix-char-depythonize-codepoint-check
Apr 14, 2026
Merged

fix: check codepoint count not byte length in deserialize_char#107
davidhewitt merged 1 commit intodavidhewitt:mainfrom
leftygbalogh:lefty/fix-char-depythonize-codepoint-check

Conversation

@leftygbalogh
Copy link
Copy Markdown
Contributor

Problem

deserialize_char guards against non-char-length strings with:

ust if s.len() != 1 { return Err(PythonizeError::invalid_length_char()); }

s.len() returns the byte length of the UTF-8 string. Any non-ASCII single-codepoint character (e.g. 'ä' U+00E4) is 1 codepoint but 2 UTF-8 bytes, so the guard fires incorrectly and depythonize::<char> returns Err(InvalidLengthChar) for valid input.

This affects every Unicode codepoint above U+007F — Latin Extended, Greek, Cyrillic, Arabic, CJK, emoji, etc.

Fix

ust if s.chars().count() != 1 {

chars().count() counts Unicode codepoints, which is the semantically correct check for the char type.

Test

A new test est_char_multibyte_codepoint is added in de.rs covering 'ä' (U+00E4, 2 UTF-8 bytes). The existing est_char (ASCII 'a') continues to pass.

The guard s.len() != 1 used byte length, causing depythonize::<char>
to return Err(InvalidLengthChar) for any non-ASCII single-codepoint
character (e.g. 'ä' U+00E4 is 1 codepoint but 2 UTF-8 bytes).

Fix: use s.chars().count() != 1 which counts Unicode codepoints.
A test for the multibyte-codepoint case is added to de.rs.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.57%. Comparing base (0085a18) to head (7ca25d5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   84.48%   84.57%   +0.09%     
==========================================
  Files           3        3              
  Lines        1186     1193       +7     
  Branches     1186     1193       +7     
==========================================
+ Hits         1002     1009       +7     
  Misses        118      118              
  Partials       66       66              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Owner

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@davidhewitt davidhewitt merged commit 21ad82f into davidhewitt:main Apr 14, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants