ci: use setup-python install so codspeed builds flamegraphs correctly#5997
ci: use setup-python install so codspeed builds flamegraphs correctly#5997
setup-python install so codspeed builds flamegraphs correctly#5997Conversation
Merging this PR will degrade performance by 32.76%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | dirty_attach |
3.7 µs | 4.2 µs | -12.34% |
| ❌ | clean_attach |
2.1 µs | 2.6 µs | -18.32% |
| ❌ | identify_object_type |
15.2 µs | 17.3 µs | -12.23% |
| ❌ | err_new_restore_and_fetch |
6.7 µs | 7.7 µs | -13.37% |
| ❌ | extract_bigint_extract_fail |
8.7 µs | 10.1 µs | -13.66% |
| ❌ | call |
526.6 µs | 598.2 µs | -11.97% |
| ❌ | call_0 |
157.7 µs | 223.3 µs | -29.37% |
| ❌ | extract_float_extract_fail |
8.1 µs | 9.8 µs | -17.03% |
| ❌ | call_method_0 |
572 µs | 712.4 µs | -19.71% |
| ❌ | call_1 |
190.4 µs | 264.1 µs | -27.92% |
| ❌ | call_method |
783.1 µs | 951.4 µs | -17.69% |
| ❌ | call_method_1 |
290.9 µs | 414.8 µs | -29.86% |
| ❌ | call_method_one_arg |
262 µs | 389.6 µs | -32.76% |
| ❌ | extract_int_extract_fail |
8.4 µs | 9.7 µs | -13.57% |
| ❌ | call_one_arg |
161.8 µs | 236.4 µs | -31.54% |
| ❌ | decimal_via_extract |
11.9 µs | 13.5 µs | -11.41% |
| ❌ | bench_str |
3.7 µs | 4.1 µs | -10.5% |
| ⚡ | drop_many_objects |
11 µs | 7.4 µs | +48.62% |
| ❌ | getattr_intern |
3.3 µs | 3.8 µs | -13.4% |
| ❌ | test_class_method |
14.5 µs | 18.7 µs | -22.29% |
| ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing davidhewitt:codspeed-flamegraphs (b282a90) with main (4712a0a)
Footnotes
-
1 benchmark was skipped, so the baseline result was used instead. If it was deleted from the codebase, click here and archive it to remove it from the performance reports. ↩
|
I'd by very interested in additional information about the limitation of the |
|
My understanding from @GuillaumeLagrange was that codspeed's reliance of valgrind hits some limitation of valgrind being able to understand call stacks deep inside CPython's main eval loop. It seemed like the fix was to exclude symbols from That said, this call stack seems better than the one in the original OP (for the same benchmark), however there's still some recursion going on which doesn't look right to me.
|
|
Hello @jjhelmus, and thanks for the ping @davidhewitt Without boring you all with the details, valgrind creates execution graphs that can contain cycles, instead of a top down tree like one would expect when thinking about profiling data, where the callstack only grows downwards (or upwards depending on how you visualize it 🤓 ) The easy way around this is to provide an explicit list of I'm not 100% sure forcing a non statically linked build out of python-standalone is the actual fix for this. We'd rather patch valgrind in a way cycles do not outright make the profiling useless. We have plans to do so and a few experiments on the way, but have not yet been able to output something that would be production ready to tackle this. |

I noticed some of our benchmarks have weirdly recursive traces like this:
I asked to codspeed team, apparently this is a limitation when using the
uv-provided Python installs, so instead let's usesetup-pythonto install Python for the benchmarks.