Skip to content

add PyLong* API (3.14+)#6016

Open
chirizxc wants to merge 27 commits intoPyO3:mainfrom
chirizxc:PyLongWriter
Open

add PyLong* API (3.14+)#6016
chirizxc wants to merge 27 commits intoPyO3:mainfrom
chirizxc:PyLongWriter

Conversation

@chirizxc
Copy link
Copy Markdown
Contributor

@chirizxc chirizxc commented May 5, 2026

/close #6015

main (3.13):

into_u128_zero          time:   [33.711 ns 33.971 ns 34.252 ns]
into_u128_small         time:   [34.602 ns 34.859 ns 35.157 ns]
into_u128_u32_max       time:   [34.700 ns 35.231 ns 35.684 ns]
into_u128_u64_max       time:   [40.138 ns 40.795 ns 41.375 ns]
into_u128_max           time:   [39.668 ns 40.320 ns 40.891 ns]
into_i128_zero          time:   [33.628 ns 33.947 ns 34.335 ns]
into_i128_small_pos     time:   [34.595 ns 34.900 ns 35.250 ns]
into_i128_small_neg     time:   [35.660 ns 36.223 ns 36.743 ns]
into_i128_pos_max       time:   [39.891 ns 40.513 ns 41.041 ns]
into_i128_neg_min       time:   [43.970 ns 44.582 ns 45.093 ns]
extract_u128_zero       time:   [20.255 ns 20.341 ns 20.439 ns]
extract_u128_small      time:   [20.208 ns 20.330 ns 20.466 ns]
extract_u128_u32_max    time:   [26.269 ns 26.458 ns 26.674 ns]
extract_u128_u64_max    time:   [30.125 ns 30.322 ns 30.549 ns]
extract_u128_max        time:   [35.379 ns 35.644 ns 35.933 ns]
extract_i128_zero       time:   [20.521 ns 20.602 ns 20.702 ns]
extract_i128_small_pos  time:   [20.630 ns 20.736 ns 20.858 ns]
extract_i128_small_neg  time:   [20.357 ns 20.442 ns 20.539 ns]
extract_i128_pos_max    time:   [34.338 ns 34.568 ns 34.837 ns]
extract_i128_neg_min    time:   [49.427 ns 49.754 ns 50.169 ns]

this PR (3.14):

into_u128_zero          time:   [17.348 ns 17.475 ns 17.617 ns]
into_u128_small         time:   [18.758 ns 18.950 ns 19.163 ns]
into_u128_u32_max       time:   [27.941 ns 28.377 ns 28.762 ns]
into_u128_u64_max       time:   [33.655 ns 34.173 ns 34.633 ns]
into_u128_max           time:   [34.166 ns 34.739 ns 35.296 ns]
into_i128_zero          time:   [17.017 ns 17.154 ns 17.301 ns]
into_i128_small_pos     time:   [19.636 ns 19.818 ns 20.019 ns]
into_i128_small_neg     time:   [29.753 ns 30.247 ns 30.675 ns]
into_i128_pos_max       time:   [34.856 ns 35.374 ns 35.835 ns]
into_i128_neg_min       time:   [34.675 ns 35.254 ns 35.785 ns]
extract_u128_zero       time:   [18.302 ns 18.434 ns 18.571 ns]
extract_u128_small      time:   [18.654 ns 18.771 ns 18.904 ns]
extract_u128_u32_max    time:   [19.758 ns 19.915 ns 20.084 ns]
extract_u128_u64_max    time:   [23.769 ns 23.975 ns 24.193 ns]
extract_u128_max        time:   [23.988 ns 24.128 ns 24.284 ns]
extract_i128_zero       time:   [19.126 ns 19.261 ns 19.405 ns]
extract_i128_small_pos  time:   [19.004 ns 19.157 ns 19.316 ns]
extract_i128_small_neg  time:   [18.819 ns 18.933 ns 19.058 ns]
extract_i128_pos_max    time:   [24.424 ns 24.593 ns 24.776 ns]
extract_i128_neg_min    time:   [24.007 ns 24.147 ns 24.299 ns]

cmd:

cargo bench --bench bench_int128 -- --warm-up-time 3 --measurement-time 5 2>& | grep -E '^(into|extract)_[a-z0-9_]+\s+time:'
benchmark 3.13 (ns) 3.14 (ns) speedup
into_u128_zero 33.97 17.48 1.94×
into_u128_small 34.86 18.95 1.84×
into_u128_u32_max 35.23 28.38 1.24×
into_u128_u64_max 40.80 34.17 1.19×
into_u128_max 40.32 34.74 1.16×
into_i128_zero 33.95 17.15 1.98×
into_i128_small_pos 34.90 19.82 1.76×
into_i128_small_neg 36.22 30.25 1.20×
into_i128_pos_max 40.51 35.37 1.15×
into_i128_neg_min 44.58 35.25 1.26×
extract_u128_zero 20.34 18.43 1.10×
extract_u128_small 20.33 18.77 1.08×
extract_u128_u32_max 26.46 19.91 1.33×
extract_u128_u64_max 30.32 23.98 1.26×
extract_u128_max 35.64 24.13 1.48×
extract_i128_zero 20.60 19.26 1.07×
extract_i128_small_pos 20.74 19.16 1.08×
extract_i128_small_neg 20.44 18.93 1.08×
extract_i128_pos_max 34.57 24.59 1.41×
extract_i128_neg_min 49.75 24.15 2.06×

@ngoldbaum

This comment was marked as resolved.

@chirizxc chirizxc closed this May 5, 2026
@chirizxc chirizxc reopened this May 5, 2026
Copy link
Copy Markdown
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick pass and spotted a couple issues.

Comment thread src/conversions/std/num.rs Outdated
Comment thread src/conversions/std/num.rs Outdated
@ngoldbaum
Copy link
Copy Markdown
Contributor

I looked over the FFI changes.

While we're touching longobject.rs, I'd appreciate it if you could make sure that your updates are ordered following CPython's Include/cpython/longobject.h as much as possible. For example, there are comments about skipping PyUnstable_Long_IsCompact and PyUnstable_Long_CompactValue, but those comments should follow the definition for PyLong_FromUnicodeObject because that's how the corresponding CPython header is organized in 3.14. Our bindings are also missing Py_ASNATIVEBYTES_ALLOW_INDEX, PyLong_IsPositive, PyLong_IsNegative, and PyLong_IsZero, which we should add.

It also looks like the PyLongLayout and PyLongExport structs are in cpython/longintrepr.h, so the definitions you're adding here should go into a new cpython/longintrepr.rs rust module, along with the C API definitions that are in the corresponding header.

The principle you should keep in mind when touching PyO3's FFI bindings is that it should be organized exactly like the upstream headers, as of the newest Python we support (3.14 right now, but it'll be 3.15 in the next few weeks).

@chirizxc
Copy link
Copy Markdown
Contributor Author

chirizxc commented May 6, 2026

I looked over the FFI changes.

While we're touching longobject.rs, I'd appreciate it if you could make sure that your updates are ordered following CPython's Include/cpython/longobject.h as much as possible. For example, there are comments about skipping PyUnstable_Long_IsCompact and PyUnstable_Long_CompactValue, but those comments should follow the definition for PyLong_FromUnicodeObject because that's how the corresponding CPython header is organized in 3.14. Our bindings are also missing Py_ASNATIVEBYTES_ALLOW_INDEX, PyLong_IsPositive, PyLong_IsNegative, and PyLong_IsZero, which we should add.

It also looks like the PyLongLayout and PyLongExport structs are in cpython/longintrepr.h, so the definitions you're adding here should go into a new cpython/longintrepr.rs rust module, along with the C API definitions that are in the corresponding header.

The principle you should keep in mind when touching PyO3's FFI bindings is that it should be organized exactly like the upstream headers, as of the newest Python we support (3.14 right now, but it'll be 3.15 in the next few weeks).

In branch 3.14, PyLong_FromNativeBytes, Py_ASNATIVEBYTES_BIG_ENDIAN, etc. , were moved from cpython/Include/longobject.h to cpython/longobject.h (see: 3.13* | 3.14*). Should I move them as well? I mean, do I need to fully synchronize the current files with branch 3.14?

@ngoldbaum
Copy link
Copy Markdown
Contributor

Should I move them as well? I mean, do I need to fully synchronize the current files with branch 3.14?

Yes, that's generally what we do. It's hard to keep the FFI bindings perfectly up-to-date, so it's a chore for whoever needs to touch them.

@chirizxc
Copy link
Copy Markdown
Contributor Author

chirizxc commented May 6, 2026

I split up some of the extern_libpython! blocks so that the order remains the same as in the *.h files

Copy link
Copy Markdown
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spotted some issues, see below.

Comment thread pyo3-ffi/src/cpython/longobject.rs
Comment thread pyo3-ffi/src/cpython/longobject.rs
Comment thread src/conversions/std/num.rs Outdated
value
};
let bits = 128 - abs.leading_zeros() as usize;
let n_digits = if bits == 0 { 1 } else { bits.div_ceil(30) };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes 30-bit digits, but CPython also supports 15 bit digits, it's just a non-default build configuration. I think if we added a 32 bit Windows or Linux build to CI, this test would fail.

If you don't want to implement a fast path for 15 bit digits, you cold also check e.g. sys.int_info.bits_per_digit or the C API equivalent at runtime and fall back to the 3.13 and earlier path which doesn't need to care about how many bits per digit CPython uses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we added a 32 bit Windows or Linux build to CI, this test would fail.

It seems we have compile Python yourself with the --enable-big-digits=15 flag

Comment thread src/conversions/std/num.rs Outdated
Comment thread pyo3-ffi/src/longobject.rs
Comment thread pyo3-benches/benches/bench_int128.rs Outdated
Comment thread pyo3-ffi/src/cpython/longobject.rs Outdated
Comment thread src/conversions/std/num.rs
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 7, 2026

Merging this PR will improve performance by 14.53%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 104 untouched benchmarks
🆕 20 new benchmarks
⏩ 1 skipped benchmark1

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 extract_i128_neg_min N/A 1.4 µs N/A
🆕 extract_i128_pos_max N/A 1.4 µs N/A
🆕 extract_i128_small_neg N/A 1.1 µs N/A
🆕 extract_i128_small_pos N/A 1.1 µs N/A
🆕 extract_i128_zero N/A 1.1 µs N/A
🆕 extract_u128_max N/A 1.4 µs N/A
🆕 extract_u128_small N/A 1.1 µs N/A
🆕 extract_u128_u32_max N/A 1.2 µs N/A
🆕 extract_u128_u64_max N/A 1.3 µs N/A
🆕 extract_u128_zero N/A 1.1 µs N/A
🆕 into_i128_neg_min N/A 1.6 µs N/A
🆕 into_i128_pos_max N/A 1.6 µs N/A
🆕 into_i128_small_neg N/A 1.3 µs N/A
🆕 into_i128_small_pos N/A 1.8 µs N/A
🆕 into_i128_zero N/A 1.8 µs N/A
🆕 into_u128_max N/A 1.6 µs N/A
🆕 into_u128_small N/A 1.8 µs N/A
🆕 into_u128_u32_max N/A 1.6 µs N/A
🆕 into_u128_u64_max N/A 1.6 µs N/A
🆕 into_u128_zero N/A 1.9 µs N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing chirizxc:PyLongWriter (f35bb45) with main (f57bda7)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

Copy link
Copy Markdown
Contributor

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking much better. I have some comments inline.

Looking further ahead (does not need to be landed in this PR), it probably makes sense to add conversions using this API for BigInt, rust_decimal::Decimal, and big_decimal::BigDecimal, so IMO it might bear some effort to come up with a slightly more general API to avoid duplicating logic. Right now the logic is only duplicated twice between i128 and u128, but if we add more types there will be more duplication.

Consider defining two generic functions:

// builds an int from an iterator of 30-bit digits
pub(crate) fn pylong_from_digits<I: ExactSizeIterator<Item = u32>>(py: Python<'_>, negative: bool, digits: I) -> PyResult<Bound<'_, PyInt>>

// Visits 30-bit digits LSB-first and deals with freeing the export
pub(crate) fn pylong_visit_digits<R>(
    obj: Borrowed<'_, '_, PyAny>, 
    f: impl FnOnce(bool /*negative*/, i64 /*compact*/, Option<&[u32]> /*digits*/
) -> PyResult<R> 

Also once this is merged you should open a followup issue to add more conversions that can benefit from this API.


*DIGITS.get_or_init(|| {
let layout = unsafe { &*ffi::PyLong_GetNativeLayout() };
layout.bits_per_digit == PYLONG_BITS_IN_DIGIT as u8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your implementation implicitly assumes digits_size=4, digits_order=-1, and digit_endianness set to native byteorder. You should check those invariants here, in case Python ever changes them in the future or a different Python implementation does things a different way.

)?;
}
let long_export_ref = unsafe { long_export.assume_init_ref() };
if long_export_ref.digits.is_null() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency you should call PyLong_FreeExport in this path as well.

Comment on lines +469 to +472
ffi::PyLong_Export(
num.as_ptr().cast(),
long_export.as_mut_ptr(),
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ffi::PyLong_Export(
num.as_ptr().cast(),
long_export.as_mut_ptr(),
),
ffi::PyLong_Export(num.as_ptr(), long_export.as_mut_ptr()),

Calling cast() is unnecessary - as_ptr() returns a *mut PyObject.

}

#[cfg(all(Py_3_13, not(Py_LIMITED_API)))]
#[allow(dead_code)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you change the cfg gating to avoid adding dead_code here? I'd prefer that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pyo3-ffi: Add PyLong_* fixed-width integer conversion APIs (3.14+)

2 participants