Skip to content

ci: use setup-python install so codspeed builds flamegraphs correctly#5997

Merged
Tpt merged 1 commit intoPyO3:mainfrom
davidhewitt:codspeed-flamegraphs
Apr 23, 2026
Merged

ci: use setup-python install so codspeed builds flamegraphs correctly#5997
Tpt merged 1 commit intoPyO3:mainfrom
davidhewitt:codspeed-flamegraphs

Conversation

@davidhewitt
Copy link
Copy Markdown
Member

I noticed some of our benchmarks have weirdly recursive traces like this:

image

I asked to codspeed team, apparently this is a limitation when using the uv-provided Python installs, so instead let's use setup-python to install Python for the benchmarks.

@Tpt Tpt enabled auto-merge April 23, 2026 16:27
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 23, 2026

Merging this PR will degrade performance by 32.76%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
❌ 47 regressed benchmarks
✅ 57 untouched benchmarks
⏩ 1 skipped benchmark1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
dirty_attach 3.7 µs 4.2 µs -12.34%
clean_attach 2.1 µs 2.6 µs -18.32%
identify_object_type 15.2 µs 17.3 µs -12.23%
err_new_restore_and_fetch 6.7 µs 7.7 µs -13.37%
extract_bigint_extract_fail 8.7 µs 10.1 µs -13.66%
call 526.6 µs 598.2 µs -11.97%
call_0 157.7 µs 223.3 µs -29.37%
extract_float_extract_fail 8.1 µs 9.8 µs -17.03%
call_method_0 572 µs 712.4 µs -19.71%
call_1 190.4 µs 264.1 µs -27.92%
call_method 783.1 µs 951.4 µs -17.69%
call_method_1 290.9 µs 414.8 µs -29.86%
call_method_one_arg 262 µs 389.6 µs -32.76%
extract_int_extract_fail 8.4 µs 9.7 µs -13.57%
call_one_arg 161.8 µs 236.4 µs -31.54%
decimal_via_extract 11.9 µs 13.5 µs -11.41%
bench_str 3.7 µs 4.1 µs -10.5%
drop_many_objects 11 µs 7.4 µs +48.62%
getattr_intern 3.3 µs 3.8 µs -13.4%
test_class_method 14.5 µs 18.7 µs -22.29%
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing davidhewitt:codspeed-flamegraphs (b282a90) with main (4712a0a)

Open in CodSpeed

Footnotes

  1. 1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports.

@Tpt Tpt added this pull request to the merge queue Apr 23, 2026
Merged via the queue into PyO3:main with commit e883df1 Apr 23, 2026
43 of 45 checks passed
@jjhelmus
Copy link
Copy Markdown

I'd by very interested in additional information about the limitation of the uv provided python. I'm a maintainer of this python, python-build-standalone, and can work to address these limitation.

@davidhewitt
Copy link
Copy Markdown
Member Author

My understanding from @GuillaumeLagrange was that codspeed's reliance of valgrind hits some limitation of valgrind being able to understand call stacks deep inside CPython's main eval loop.

It seemed like the fix was to exclude symbols from libpython in some form, but this exclusion only works with setup-python installs.

That said, this call stack seems better than the one in the original OP (for the same benchmark), however there's still some recursion going on which doesn't look right to me.

image

@GuillaumeLagrange
Copy link
Copy Markdown

Hello @jjhelmus, and thanks for the ping @davidhewitt

Without boring you all with the details, valgrind creates execution graphs that can contain cycles, instead of a top down tree like one would expect when thinking about profiling data, where the callstack only grows downwards (or upwards depending on how you visualize it 🤓 )
For this reason, valgrind, and by extension us at codspeed, are very sensitive to cycles when it come to attributing costs.

The easy way around this is to provide an explicit list of *.so, to ignore to valgrind, which includes libpython.so. This squashes most of the python-introduced cycles that we've observed. Unfortunately, the python standalone builds provided by uv had libpython.so statically linked, or at least it had when we last had a closer look at the issue. That makes it really hard to get good flamegraphs out of valgrind.

I'm not 100% sure forcing a non statically linked build out of python-standalone is the actual fix for this. We'd rather patch valgrind in a way cycles do not outright make the profiling useless. We have plans to do so and a few experiments on the way, but have not yet been able to output something that would be production ready to tackle this.

davidhewitt added a commit to davidhewitt/pyo3 that referenced this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants