gh-144586: Improve _Py_yield to improve light weight cpu instruction#144587
gh-144586: Improve _Py_yield to improve light weight cpu instruction#144587corona10 wants to merge 5 commits intopython:mainfrom
Conversation
|
Benchmark on my Mac mini (consistenty enhanced) baselinewith PRscript |
| extern void _Py_yield(void); | ||
| // Lightweight CPU pause hint for spin-wait loops (e.g., x86 PAUSE, AArch64 WFE). | ||
| // Falls back to sched_yield() on platforms without a known pause instruction. | ||
| static inline void |
There was a problem hiding this comment.
I made it static inline because the function call overhead is more expensive than a single instruction.
There was a problem hiding this comment.
This function is only used in lock.c, why move it to header?
There was a problem hiding this comment.
Ah yeah, we can move back to lock.c
There was a problem hiding this comment.
I see, I was looking at older checkout of main branch. Making it static inline looks fine although I think LTO would have inlined it anyways.
There was a problem hiding this comment.
I think LTO would have inlined it anyways.
I think that same way, but just follow our old convention :)
| #elif defined(_M_X64) || defined(_M_IX86) | ||
| _mm_pause(); | ||
| #elif defined(_M_ARM64) || defined(_M_ARM) | ||
| __yield(); |
There was a problem hiding this comment.
|
A lightweight pause is usually not what we want here. See
https://webkit.org/blog/6161/locking-in-webkit/ for some discussions on
yield vs pause.
We also have an unresolved issue where we are only spinning for one
iteration, but changing that seems to hurt performance.
Anyways, I think we should be careful in th changes we make here.
…On Sun, Feb 8, 2026 at 7:16 AM Donghee Na ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In Include/internal/pycore_lock.h
<#144587 (comment)>:
> @@ -70,8 +74,25 @@ PyMutex_LockFlags(PyMutex *m, _PyLockFlags flags)
// error messages) otherwise returns 0.
extern int _PyMutex_TryUnlock(PyMutex *m);
-// Yield the processor to other threads (e.g., sched_yield).
-extern void _Py_yield(void);
+// Lightweight CPU pause hint for spin-wait loops (e.g., x86 PAUSE, AArch64 WFE).
+// Falls back to sched_yield() on platforms without a known pause instruction.
+static inline void
I think LTO would have inlined it anyways.
I think that same way, but just follow our old convention :)
—
Reply to this email directly, view it on GitHub
<#144587 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFAD6USQWH6RK6X4MNGBOT4K5ALBAVCNFSM6AAAAACUL2ZTD2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTONRZHAYDSMJYGQ>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Just for sharing: From my micro bechmark and ft-scailing benchmark doesn't show negotive impact from this change. |
Uh oh!
There was an error while loading. Please reload this page.