Conversation
This comment was marked as resolved.
This comment was marked as resolved.
ngoldbaum
left a comment
There was a problem hiding this comment.
Did a quick pass and spotted a couple issues.
|
I looked over the FFI changes. While we're touching It also looks like the The principle you should keep in mind when touching PyO3's FFI bindings is that it should be organized exactly like the upstream headers, as of the newest Python we support (3.14 right now, but it'll be 3.15 in the next few weeks). |
In branch 3.14, |
Yes, that's generally what we do. It's hard to keep the FFI bindings perfectly up-to-date, so it's a chore for whoever needs to touch them. |
|
I split up some of the |
ngoldbaum
left a comment
There was a problem hiding this comment.
I spotted some issues, see below.
| value | ||
| }; | ||
| let bits = 128 - abs.leading_zeros() as usize; | ||
| let n_digits = if bits == 0 { 1 } else { bits.div_ceil(30) }; |
There was a problem hiding this comment.
This hard-codes 30-bit digits, but CPython also supports 15 bit digits, it's just a non-default build configuration. I think if we added a 32 bit Windows or Linux build to CI, this test would fail.
If you don't want to implement a fast path for 15 bit digits, you cold also check e.g. sys.int_info.bits_per_digit or the C API equivalent at runtime and fall back to the 3.13 and earlier path which doesn't need to care about how many bits per digit CPython uses.
There was a problem hiding this comment.
I think if we added a 32 bit Windows or Linux build to CI, this test would fail.
It seems we have compile Python yourself with the --enable-big-digits=15 flag
Merging this PR will improve performance by 14.53%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| 🆕 | extract_i128_neg_min |
N/A | 1.4 µs | N/A |
| 🆕 | extract_i128_pos_max |
N/A | 1.4 µs | N/A |
| 🆕 | extract_i128_small_neg |
N/A | 1.1 µs | N/A |
| 🆕 | extract_i128_small_pos |
N/A | 1.1 µs | N/A |
| 🆕 | extract_i128_zero |
N/A | 1.1 µs | N/A |
| 🆕 | extract_u128_max |
N/A | 1.4 µs | N/A |
| 🆕 | extract_u128_small |
N/A | 1.1 µs | N/A |
| 🆕 | extract_u128_u32_max |
N/A | 1.2 µs | N/A |
| 🆕 | extract_u128_u64_max |
N/A | 1.3 µs | N/A |
| 🆕 | extract_u128_zero |
N/A | 1.1 µs | N/A |
| 🆕 | into_i128_neg_min |
N/A | 1.6 µs | N/A |
| 🆕 | into_i128_pos_max |
N/A | 1.6 µs | N/A |
| 🆕 | into_i128_small_neg |
N/A | 1.3 µs | N/A |
| 🆕 | into_i128_small_pos |
N/A | 1.8 µs | N/A |
| 🆕 | into_i128_zero |
N/A | 1.8 µs | N/A |
| 🆕 | into_u128_max |
N/A | 1.6 µs | N/A |
| 🆕 | into_u128_small |
N/A | 1.8 µs | N/A |
| 🆕 | into_u128_u32_max |
N/A | 1.6 µs | N/A |
| 🆕 | into_u128_u64_max |
N/A | 1.6 µs | N/A |
| 🆕 | into_u128_zero |
N/A | 1.9 µs | N/A |
| ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing chirizxc:PyLongWriter (f35bb45) with main (f57bda7)
Footnotes
-
1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩
ngoldbaum
left a comment
There was a problem hiding this comment.
Overall looking much better. I have some comments inline.
Looking further ahead (does not need to be landed in this PR), it probably makes sense to add conversions using this API for BigInt, rust_decimal::Decimal, and big_decimal::BigDecimal, so IMO it might bear some effort to come up with a slightly more general API to avoid duplicating logic. Right now the logic is only duplicated twice between i128 and u128, but if we add more types there will be more duplication.
Consider defining two generic functions:
// builds an int from an iterator of 30-bit digits
pub(crate) fn pylong_from_digits<I: ExactSizeIterator<Item = u32>>(py: Python<'_>, negative: bool, digits: I) -> PyResult<Bound<'_, PyInt>>
// Visits 30-bit digits LSB-first and deals with freeing the export
pub(crate) fn pylong_visit_digits<R>(
obj: Borrowed<'_, '_, PyAny>,
f: impl FnOnce(bool /*negative*/, i64 /*compact*/, Option<&[u32]> /*digits*/
) -> PyResult<R>
Also once this is merged you should open a followup issue to add more conversions that can benefit from this API.
|
|
||
| *DIGITS.get_or_init(|| { | ||
| let layout = unsafe { &*ffi::PyLong_GetNativeLayout() }; | ||
| layout.bits_per_digit == PYLONG_BITS_IN_DIGIT as u8 |
There was a problem hiding this comment.
your implementation implicitly assumes digits_size=4, digits_order=-1, and digit_endianness set to native byteorder. You should check those invariants here, in case Python ever changes them in the future or a different Python implementation does things a different way.
| )?; | ||
| } | ||
| let long_export_ref = unsafe { long_export.assume_init_ref() }; | ||
| if long_export_ref.digits.is_null() { |
There was a problem hiding this comment.
For consistency you should call PyLong_FreeExport in this path as well.
| ffi::PyLong_Export( | ||
| num.as_ptr().cast(), | ||
| long_export.as_mut_ptr(), | ||
| ), |
There was a problem hiding this comment.
| ffi::PyLong_Export( | |
| num.as_ptr().cast(), | |
| long_export.as_mut_ptr(), | |
| ), | |
| ffi::PyLong_Export(num.as_ptr(), long_export.as_mut_ptr()), |
Calling cast() is unnecessary - as_ptr() returns a *mut PyObject.
| } | ||
|
|
||
| #[cfg(all(Py_3_13, not(Py_LIMITED_API)))] | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Couldn't you change the cfg gating to avoid adding dead_code here? I'd prefer that.
/close #6015
main (3.13):
this PR (3.14):
cmd:
into_u128_zerointo_u128_smallinto_u128_u32_maxinto_u128_u64_maxinto_u128_maxinto_i128_zerointo_i128_small_posinto_i128_small_neginto_i128_pos_maxinto_i128_neg_minextract_u128_zeroextract_u128_smallextract_u128_u32_maxextract_u128_u64_maxextract_u128_maxextract_i128_zeroextract_i128_small_posextract_i128_small_negextract_i128_pos_maxextract_i128_neg_min