[draft] Simplify tracer ze via introspec#501
Open
TApplencourt wants to merge 19 commits into
Open
Conversation
Replace the FIND_AND_DEL_ZE_OBJ + read-stored-context + ADD_ZE_OBJ dance with a direct introspection call. The hot path (every NULL-signal Append) no longer touches the cmdlist hash.
Replace the cl_data->context read with a direct introspection call. Keeps the FIND_AND_DEL_ZE_OBJ dance in place because cl_data->events (the per-cl event linked list) and cl_data->flags & _ZE_IMMEDIATE are still read here.
…ry_info Both helpers were doing FIND_ZE_OBJ on the cmdlist hash purely to read cl_data->device or cl_data->context. Direct introspection calls remove the lookup and let cl_data->device / cl_data->context become write-only (cleaned up in a later commit).
After the introspection swaps, nothing reads cl_data->device,
cl_data->context, or cl_data->driver — they were only populated to
serve callers that now use zeCommandList{Get,Is}* directly.
Cascade:
- struct _ze_command_list_obj_data: drop device/context/driver
- _on_create_command_list: drop hContext/hDevice params and the
device-hash lookup that only existed to populate driver
- ze_model.rb: drop the hContext/hDevice args at the two call sites
- struct _ze_device_obj_data and _register_ze_device: fully unused
now that nothing reads the device hash. Drop them and their two
epilogue registrations (zeDeviceGet, zeDeviceGetSubDevices).
- enum _ze_obj_type: drop DRIVER and DEVICE values.
Net: ~80 LOC removed; the cmdlist hash entry now stores only flags
and the per-cl events list.
Switches the immediate-flag source for events to introspection. Stops reading cl_data->flags & _ZE_IMMEDIATE here; the stored flag is still written/read in _on_destroy_command_list (see note below). Investigation of an attempted parallel swap in _on_destroy_command_list: _on_destroy_command_list runs as the EPILOGUE of zeCommandListDestroy. By the time it fires, the cmdlist handle has been freed by the driver, and zeCommandListIsImmediate(handle) segfaults. We must keep using the stored cl_data flag in that one place. A code comment makes this explicit so a future cleanup doesn't re-attempt the swap. Verified: 8/8 baseline iprof tests + sampled non-skipped extras, no regressions.
The cl-side _ZE_IMMEDIATE flag existed only so destroy/reset drainers could ask "are this cl's events ready to query?" — true for immediate cls (which have no Execute step) and for regular cls that have been Executed. Stamping _ZE_EXECUTED at create time for immediate cls unifies both into a single check, lets _on_destroy_command_list drop its compound (immediate || executed) test, and removes the "epilogue can't introspect" foot-gun entirely from this path.
After dropping the device hash entries and obj_data_free, _ze_objs
only ever held command-list payloads (in non-debug builds) or
no-op event-tag entries (debug-only). Both indirections are now
dead weight.
Changes:
- Drop debug-only event-hash usage in _on_created_event /
_on_destroy_event (removes the THAPI_DEBUG branches and the
zeEventCreate epilogue registration).
- Fold _ze_obj_h's ptr/hh into _ze_command_list_obj_data; drop the
separate header struct, the trailing-data calloc, and the
obj_data void* indirection.
- Rename macros and global to reflect the now-monomorphic hash:
_ze_objs -> _ze_cls, FIND/ADD/FIND_AND_DEL_ZE_OBJ ->
*_ZE_CL.
Net: ~55 LOC removed; one less struct, one less indirection.
_register_ze_event used a NULL-_ze_event-pointer as a discriminator
between two different paths (tracer-injected event vs user event).
Splitting them into _register_our_event and _register_user_event
makes each path linear and removes the magic NULL dispatch.
Two helpers extracted along the way:
- _tag_event_from_cl: shared introspection (context + immediate)
- _attach_event_to_cl: shared FIND_AND_DEL/ADD pattern that guards
cl_data against a concurrent free in _on_destroy_command_list.
Also: remove redundant zero-initializations of event_pool and flags
in the user-event path. GET_ZE_EVENT_WRAPPER returns a fully-zeroed
wrapper (calloc on first use, memset by PUT_ZE_EVENT_WRAPPER on
recycle), so only the fields we want non-zero need explicit assignment.
ze_model.rb's profiling_epilogue picks the right function based on
whether _ewrapper is set (we injected) or NULL (user event).
The previous comment cited a 'project-ze-introspect' memory file that doesn't exist. Replace with the actual reason the immediate flag is snapshotted at register time: by _on_reset_event time the cmdlist may already be destroyed, so introspecting it would dereference a freed handle.
The Query op has no kernel to time — its signal event records whatever the driver happened to sample, leading to nonsensical device-side durations (5.73min max observed under m_ooo_query_no_wait_stress_ze, total claimed 10.79h). Drop it from the profiling-injection list. Host-side timing of the API call itself is still tracked (it's still a regular ze_* call that goes through the host tracepoints).
The _ZE_PROFILED bit was read in 5 places (always negated) and cleared in one, but never set. Every guarded `_profile_event_results` call therefore always fired, and the clear was a no-op. Drop the enum value, the 5 guards, and the clear. _ZE_IMMEDIATE_CMD is now the only remaining event flag.
Both call sites already iterate cl_data->events and have the wrapper in hand. Passing _ze_event_h* instead of ze_event_handle_t skips a hash lookup per element on cl reset/destroy. The remaining FIND_AND_DEL_ZE_EVENT inside the function is just removing the wrapper from the global events hash; it's still required because external state outside this function (e.g. _on_reset_event) also indexes by event handle.
Definition moved to where the callers can see it directly. Removes the forward declaration that existed only because the function was defined too far down the file.
Field was set in 6 places, read in 0 — pure dead state.
Cascade:
- _register_user_event's 'already registered' early-return branch
used to update command_list there; now it's a pure no-op return.
- _register_our_event's first line (assigning command_list) gone.
- _get_profiling_event's two assignments gone.
- PUT_ZE_EVENT macro's redundant clear gone.
The field's original purpose was probably "track which cmdlist this
event most recently belonged to" but no consumer ever read it. The
new tracer can introduce a real cl pointer if/when it needs one.
The /* should put? */ comment marked an unresolved decision. The neighboring sweep over the per-context free list (lines just below) already does PUT_ZE_EVENT_WRAPPER for the same struct type; doing the same here avoids alloc/free churn in workloads that create and destroy contexts in a loop (e.g. m_many_contexts_ze). No behavior change for single-context workloads.
Three wrapper-teardown sites duplicated the same sequence: 1. _event_cleanup (process exit destructor) 2. _on_destroy_context first sweep (live events for that context) 3. _on_destroy_context second sweep (per-context free list) All three: optionally dump timestamp tracepoint, destroy our injected event+pool if we own them, recycle the wrapper. Extracted into _dispose_event_wrapper(wrapper, do_dump) — site 3 passes do_dump=0 because its wrappers were already drained when they were recycled. This is the "tear down a wrapper" primitive PLAN_v2's per-scenario drain hooks will all call. Side effect: _event_cleanup now recycles wrappers via PUT_ZE_EVENT_WRAPPER instead of free()'ing them. Process is exiting either way, so no observable difference, but the code path is now uniform. [DRAFT]: compile-clean but bats not re-run — compute node was lost mid-iteration. Verify m_many_contexts_ze, m_concurrent_distinct_cls_ze, and baselines all green before merging.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.