Skip to content

[draft] Simplify tracer ze via introspec#501

Open
TApplencourt wants to merge 19 commits into
develfrom
simplify_tracer_ze_introspec
Open

[draft] Simplify tracer ze via introspec#501
TApplencourt wants to merge 19 commits into
develfrom
simplify_tracer_ze_introspec

Conversation

@TApplencourt
Copy link
Copy Markdown
Collaborator

No description provided.

Replace the FIND_AND_DEL_ZE_OBJ + read-stored-context + ADD_ZE_OBJ
dance with a direct introspection call. The hot path (every
NULL-signal Append) no longer touches the cmdlist hash.
Replace the cl_data->context read with a direct introspection call.
Keeps the FIND_AND_DEL_ZE_OBJ dance in place because cl_data->events
(the per-cl event linked list) and cl_data->flags & _ZE_IMMEDIATE are
still read here.
…ry_info

Both helpers were doing FIND_ZE_OBJ on the cmdlist hash purely to
read cl_data->device or cl_data->context. Direct introspection calls
remove the lookup and let cl_data->device / cl_data->context become
write-only (cleaned up in a later commit).
After the introspection swaps, nothing reads cl_data->device,
cl_data->context, or cl_data->driver — they were only populated to
serve callers that now use zeCommandList{Get,Is}* directly.

Cascade:
  - struct _ze_command_list_obj_data: drop device/context/driver
  - _on_create_command_list: drop hContext/hDevice params and the
    device-hash lookup that only existed to populate driver
  - ze_model.rb: drop the hContext/hDevice args at the two call sites
  - struct _ze_device_obj_data and _register_ze_device: fully unused
    now that nothing reads the device hash. Drop them and their two
    epilogue registrations (zeDeviceGet, zeDeviceGetSubDevices).
  - enum _ze_obj_type: drop DRIVER and DEVICE values.

Net: ~80 LOC removed; the cmdlist hash entry now stores only flags
and the per-cl events list.
Switches the immediate-flag source for events to introspection.
Stops reading cl_data->flags & _ZE_IMMEDIATE here; the stored flag is
still written/read in _on_destroy_command_list (see note below).

Investigation of an attempted parallel swap in _on_destroy_command_list:
  _on_destroy_command_list runs as the EPILOGUE of zeCommandListDestroy.
  By the time it fires, the cmdlist handle has been freed by the driver,
  and zeCommandListIsImmediate(handle) segfaults. We must keep using the
  stored cl_data flag in that one place. A code comment makes this
  explicit so a future cleanup doesn't re-attempt the swap.

Verified: 8/8 baseline iprof tests + sampled non-skipped extras, no
regressions.
The cl-side _ZE_IMMEDIATE flag existed only so destroy/reset drainers
could ask "are this cl's events ready to query?" — true for immediate
cls (which have no Execute step) and for regular cls that have been
Executed. Stamping _ZE_EXECUTED at create time for immediate cls
unifies both into a single check, lets _on_destroy_command_list drop
its compound (immediate || executed) test, and removes the
"epilogue can't introspect" foot-gun entirely from this path.
After dropping the device hash entries and obj_data_free, _ze_objs
only ever held command-list payloads (in non-debug builds) or
no-op event-tag entries (debug-only). Both indirections are now
dead weight.

Changes:
  - Drop debug-only event-hash usage in _on_created_event /
    _on_destroy_event (removes the THAPI_DEBUG branches and the
    zeEventCreate epilogue registration).
  - Fold _ze_obj_h's ptr/hh into _ze_command_list_obj_data; drop the
    separate header struct, the trailing-data calloc, and the
    obj_data void* indirection.
  - Rename macros and global to reflect the now-monomorphic hash:
    _ze_objs -> _ze_cls, FIND/ADD/FIND_AND_DEL_ZE_OBJ ->
    *_ZE_CL.

Net: ~55 LOC removed; one less struct, one less indirection.
_register_ze_event used a NULL-_ze_event-pointer as a discriminator
between two different paths (tracer-injected event vs user event).
Splitting them into _register_our_event and _register_user_event
makes each path linear and removes the magic NULL dispatch.

Two helpers extracted along the way:
  - _tag_event_from_cl: shared introspection (context + immediate)
  - _attach_event_to_cl: shared FIND_AND_DEL/ADD pattern that guards
    cl_data against a concurrent free in _on_destroy_command_list.

Also: remove redundant zero-initializations of event_pool and flags
in the user-event path. GET_ZE_EVENT_WRAPPER returns a fully-zeroed
wrapper (calloc on first use, memset by PUT_ZE_EVENT_WRAPPER on
recycle), so only the fields we want non-zero need explicit assignment.

ze_model.rb's profiling_epilogue picks the right function based on
whether _ewrapper is set (we injected) or NULL (user event).
The previous comment cited a 'project-ze-introspect' memory file that
doesn't exist. Replace with the actual reason the immediate flag is
snapshotted at register time: by _on_reset_event time the cmdlist may
already be destroyed, so introspecting it would dereference a freed
handle.
The Query op has no kernel to time — its signal event records
whatever the driver happened to sample, leading to nonsensical
device-side durations (5.73min max observed under
m_ooo_query_no_wait_stress_ze, total claimed 10.79h).

Drop it from the profiling-injection list. Host-side timing of the
API call itself is still tracked (it's still a regular ze_* call
that goes through the host tracepoints).
The _ZE_PROFILED bit was read in 5 places (always negated) and
cleared in one, but never set. Every guarded `_profile_event_results`
call therefore always fired, and the clear was a no-op.

Drop the enum value, the 5 guards, and the clear. _ZE_IMMEDIATE_CMD
is now the only remaining event flag.
Both call sites already iterate cl_data->events and have the wrapper
in hand. Passing _ze_event_h* instead of ze_event_handle_t skips a
hash lookup per element on cl reset/destroy.

The remaining FIND_AND_DEL_ZE_EVENT inside the function is just
removing the wrapper from the global events hash; it's still required
because external state outside this function (e.g. _on_reset_event)
also indexes by event handle.
Definition moved to where the callers can see it directly. Removes
the forward declaration that existed only because the function was
defined too far down the file.
Field was set in 6 places, read in 0 — pure dead state.

Cascade:
  - _register_user_event's 'already registered' early-return branch
    used to update command_list there; now it's a pure no-op return.
  - _register_our_event's first line (assigning command_list) gone.
  - _get_profiling_event's two assignments gone.
  - PUT_ZE_EVENT macro's redundant clear gone.

The field's original purpose was probably "track which cmdlist this
event most recently belonged to" but no consumer ever read it. The
new tracer can introduce a real cl pointer if/when it needs one.
The /* should put? */ comment marked an unresolved decision. The
neighboring sweep over the per-context free list (lines just below)
already does PUT_ZE_EVENT_WRAPPER for the same struct type; doing
the same here avoids alloc/free churn in workloads that create and
destroy contexts in a loop (e.g. m_many_contexts_ze). No behavior
change for single-context workloads.
Three wrapper-teardown sites duplicated the same sequence:
  1. _event_cleanup (process exit destructor)
  2. _on_destroy_context first sweep (live events for that context)
  3. _on_destroy_context second sweep (per-context free list)

All three: optionally dump timestamp tracepoint, destroy our injected
event+pool if we own them, recycle the wrapper. Extracted into
_dispose_event_wrapper(wrapper, do_dump) — site 3 passes do_dump=0
because its wrappers were already drained when they were recycled.

This is the "tear down a wrapper" primitive PLAN_v2's per-scenario
drain hooks will all call.

Side effect: _event_cleanup now recycles wrappers via
PUT_ZE_EVENT_WRAPPER instead of free()'ing them. Process is exiting
either way, so no observable difference, but the code path is now
uniform.

[DRAFT]: compile-clean but bats not re-run — compute node was lost
mid-iteration. Verify m_many_contexts_ze, m_concurrent_distinct_cls_ze,
and baselines all green before merging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant