Skip to content

fix(compositor): export composite-write fence for cross-device frame ordering#146

Merged
K1ngst0m merged 3 commits into
mainfrom
fix/p5-composite-write-fence
Jun 8, 2026
Merged

fix(compositor): export composite-write fence for cross-device frame ordering#146
K1ngst0m merged 3 commits into
mainfrom
fix/p5-composite-write-fence

Conversation

@K1ngst0m

@K1ngst0m K1ngst0m commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Problem

The compositor composites a target app's frame on its own wlroots renderer (GLES2/EGL by default), exports it as a DMA-BUF, and goggles imports + samples it on a separate Vulkan VkDevice. Cross-device write→read ordering must be carried by an OS primitive (sync_file / dma_fence) — there is no shared queue.

  • P5-1 (silent no-fence path): for a client that never binds wp_linux_drm_syncobj_v1 (basic GL/SW, many XWayland), frame.sync_fd was left empty and the frame exported anyway, so the consumer's submit had no producer-ordering primitive — under GPU contention it can sample a torn/partial/stale frame. The lone device.waitIdle() is on goggles' device and cannot order the compositor device's write.
  • P5-2 (wrong primitive): when a fence was exported it came from the root surface's acquire_timeline (client input), not the composite output write; the comment claiming otherwise was inaccurate.

Fix

The compositor owns a composite-OUTPUT wlr_drm_syncobj_timeline, gated on renderer->features.timeline; the present pass is begun with {signal_timeline, ++point}, and after submit a non-blocking WAIT_AVAILABLE check + export_sync_file populates frame.sync_fd (UniqueFd-owned). The old root-surface acquire-fence export is removed; the release-timeline signalling is kept. Consumer unchanged — the existing eSyncFd/eTemporary import now waits on the correct composite-write dma_fence. When the renderer lacks timeline support, behavior degrades to the prior implicit-sync path (no worse than before).

Verification

  • Builds clean; ctest --preset test 12/12.
  • On AMD radeonsi / GLES2 / Mesa 26.1.1 via a non-syncobj client: gate active (composite-write signal timeline enabled), fence exported on every frame (0 not-materialized over 29 frames).
  • Tear-free-under-contention validation (RenderDoc/pixel-diff) remains for hardware testing; an orthogonal write-after-read swapchain-recycle hazard pre-exists and is neither addressed nor regressed.

Notes

Split out of the original combined branch: the documentation/policy changes now live in #147. Adds three compositor-thread-owned CompositorState members; no new threads/deps/CI structure changes.

The pixi.lock sync commit fixes a pre-existing CI breakage on main (pixi install --locked rejects the committed lock on a clean checkout — libxkbcommon 1.13.1 no longer matches the manifest). Transitive patch bump only; identical change also on #147.

🤖 Generated with Claude Code

@qodo-code-review

qodo-code-review Bot commented Jun 8, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

Grey Divider


Action required

1. Stale timeline support flag 🐞 Bug ☼ Reliability
Description
CompositorState::create_compositor() only updates present_timeline_supported when
(renderer->features.timeline && drm_fd >= 0) is true, while teardown() clears
present_signal_timeline without resetting the flag. If the compositor is stopped/started and the
second init can’t create a timeline, render_surface_to_frame() may treat timeline support as
enabled and pass/use a null present_signal_timeline, risking a crash.
Code

src/compositor/compositor_core.cpp[R201-216]

+    // Own a composite-OUTPUT signal timeline so every exported frame carries a producer-write
+    // fence (P5), independent of whether the client bound wp_linux_drm_syncobj_v1. Best-effort:
+    // absence degrades to the status-quo implicit path (see compositor_present.cpp).
+    if (renderer->features.timeline && drm_fd >= 0) {
+        present_signal_timeline = wlr_drm_syncobj_timeline_create(drm_fd);
+        present_timeline_supported = (present_signal_timeline != nullptr);
+        if (present_timeline_supported) {
+            GOGGLES_LOG_INFO("Compositor: composite-write signal timeline enabled");
+        } else {
+            GOGGLES_LOG_WARN(
+                "Compositor: signal timeline creation failed; implicit signal sync fallback");
+        }
+    } else {
+        GOGGLES_LOG_WARN("Compositor: renderer lacks features.timeline; composite-write fence "
+                         "export unavailable (implicit signal sync fallback)");
+    }
Evidence
The lifecycle allows stop/start reuse of the same state object, but present_timeline_supported is
not consistently reset when timeline creation is skipped, while the render path assumes the flag
implies a valid timeline pointer.

src/compositor/compositor_core.cpp[186-216]
src/compositor/compositor_core.cpp[393-406]
src/compositor/compositor_present.cpp[657-744]
src/compositor/compositor_server.hpp[46-62]
src/compositor/compositor_server.cpp[28-50]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`present_timeline_supported` can become stale across compositor restarts: `teardown()` nulls `present_signal_timeline` but does not reset `present_timeline_supported`, and `create_compositor()` only assigns `present_timeline_supported` inside one branch. This can leave `present_timeline_supported == true` while `present_signal_timeline == nullptr`, which is then used in `render_surface_to_frame()`.

### Issue Context
`CompositorServer` exposes public `start()`/`stop()` and `start()` re-runs the initialization sequence on the same `CompositorState` instance.

### Fix Focus Areas
- src/compositor/compositor_core.cpp[186-216]
- src/compositor/compositor_core.cpp[393-411]
- src/compositor/compositor_present.cpp[657-667]

### Implementation notes
- In `create_compositor()`, proactively reset the composite-write timeline state before feature checks:
 - `present_timeline_supported = false;`
 - `present_signal_timeline = nullptr;` (and/or unref any existing timeline defensively)
 - optionally reset `present_signal_point = 0` when creating a new timeline.
- In the `else` path (no support), explicitly set `present_timeline_supported = false`.
- In `teardown()`, also set `present_timeline_supported = false` (and optionally reset `present_signal_point`).
- In `render_surface_to_frame()`, harden the condition to avoid null use (e.g., require both `present_timeline_supported` and `present_signal_timeline != nullptr`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Wrong fallback warning cause 🐞 Bug ◔ Observability
Description
The create_compositor() fallback warning claims the renderer lacks features.timeline, but that
else branch also executes when drm_fd < 0. This misattributes why composite-write fence export
is disabled and can mislead debugging.
Code

src/compositor/compositor_core.cpp[R213-216]

+    } else {
+        GOGGLES_LOG_WARN("Compositor: renderer lacks features.timeline; composite-write fence "
+                         "export unavailable (implicit signal sync fallback)");
+    }
Evidence
The condition for the timeline create path includes both renderer->features.timeline and `drm_fd
>= 0`, but the warning text only mentions one of those predicates.

src/compositor/compositor_core.cpp[193-216]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The warning message in the `else` branch blames missing `renderer->features.timeline`, but the branch is taken both when timeline features are absent and when `drm_fd` is invalid (< 0). This produces incorrect diagnostics.

### Issue Context
You already compute `drm_fd` and gate syncobj manager creation on `drm_fd >= 0`, so the code clearly anticipates that `drm_fd` can be invalid.

### Fix Focus Areas
- src/compositor/compositor_core.cpp[193-216]

### Implementation notes
- Split the condition:
 - If `drm_fd < 0`: log that there is no DRM fd so timeline export is unavailable.
 - Else if `!renderer->features.timeline`: log that timeline feature is missing.
 - Else: attempt creation.
- Keep this log at warn if it’s truly unexpected; otherwise consider info/debug (optional).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@K1ngst0m, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 11 minutes and 16 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42136fde-f46d-4b06-b526-176a6be32404

📥 Commits

Reviewing files that changed from the base of the PR and between 7ebcd5c and 4bf0ed4.

📒 Files selected for processing (3)
  • src/compositor/compositor_core.cpp
  • src/compositor/compositor_present.cpp
  • src/compositor/compositor_state.hpp
📝 Walkthrough

Walkthrough

This PR redesigns the project policy framework from a flat v1.3 checklist to a v2.0 layered invariant system with five policy domains (Process, Boundary, Resource, Sync, Liveness), while simultaneously implementing DRM timeline-based synchronization in the compositor to enforce the sync policy constraints. The changes establish formal architectural contracts and enable kernel-driven sync point progression.

Changes

Policy Framework & Compositor Timeline Sync

Layer / File(s) Summary
Policy v2.0 Framework & Foundational Rules
docs/project_policies.md (v2.0 foundation)
Rewrote policies from v1.3 checklist to v2.0 three-tier enforcement with layered invariants, formal rule schema (Guard, Safety, Escapes, Locked/Free), and rule IDs. Defined Process layer (subsystem ordering, thread ownership, global state, mediator prohibition) and Boundary layer (encapsulation, event/RAII comms, type hygiene, DMA-BUF ownership, Vulkan error handling, destruction ordering, docs/headers).
Resource & Sync Policy Layers
docs/project_policies.md (Resource/Sync/Liveness sections)
Added Resource layer (swapchain/DMA-BUF/hot-reload lifecycle, wlroots listeners) and Sync layer (semaphore lifecycle, fence-before-acquire, frame handoff, SPSC queue contract). Included agent procedures (Self-Check, Refinement, Gate Migration).
Boundary, Process, & Liveness Policy Details
docs/policies/boundary.md, docs/policies/process.md, docs/policies/liveness.md
Detailed individual policy layers: boundary.md (module contracts, encapsulation, type safety, DMA-BUF/Vulkan rules, documentation); process.md (lifecycle trees, thread ownership, mediator avoidance); liveness.md (hot-reload recordability, frame pipeline progress, compositor event loop termination).
Resource & Sync Policy Layer Details
docs/policies/resource.md, docs/policies/sync.md
Resource.md defining swapchain recreation, DMA-BUF import, shader hot-reload, and listener lifecycle invariants. Sync.md documenting semaphore lifecycle, fence ordering, frame handoff, and SPSC queue contracts—directly mapping to implemented compositor sync logic.
Roadmap Shader Validation & Infrastructure
ROADMAP.md
Expanded Phase 1 shader validation with golden image generation, SSIM comparison, automated regression detection, CI visual test enablement. Updated long-term support status, instrumentation plans, GPU sync known issues, and Compositor Protocol Completeness capability tracking.
Compositor Timeline Type & State
src/compositor/compositor_state.hpp
Added wlr_drm_syncobj_timeline forward declaration and using alias. Extended CompositorState with present_signal_timeline pointer and present_signal_point counter to track compositor-owned output write timeline and signal progression.
Compositor Timeline Creation & Lifecycle
src/compositor/compositor_core.cpp
Added wlroots DRM syncobj header. Conditionally create present-signal timeline during initialization when renderer supports timeline features and drm_fd valid; log enabled/fallback state. Unrefs and nulls timeline on teardown.
Compositor Buffer Pass & Sync Export
src/compositor/compositor_present.cpp
Added DRM header support. Modified render_surface_to_frame to conditionally initiate wlroots buffer pass with wlr_buffer_pass_options (compositor timeline/point) when available; otherwise fallback. Replaced root surface acquire-fence export with "composite write" fence from compositor signal timeline at render pass point, with materialization checks and logging.
Filter-Chain Subproject Update
filter-chain (subproject reference)
Updated commit reference to enable shader validation and test infrastructure improvements.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • goggles-dev/Goggles#18: Both PRs introduce/refine ROADMAP.md, with the retrieved PR establishing initial roadmap structure while this PR expands shader validation phases and latency optimization specifics.

Suggested labels

Review effort 4/5, documentation, policy, compositor, sync

Poem

🐰 The rabbit hops through policies new,
Five layered rules to guide what's true,
Timeline signals cross the GPU,
Fences dance in bounded queues,
Liveness assured—the frame flies through!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title precisely summarizes the main change: adding composite-write fence export for cross-device frame ordering in the compositor, which is the core technical fix across the modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/p5-composite-write-fence

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +201 to +216
// Own a composite-OUTPUT signal timeline so every exported frame carries a producer-write
// fence (P5), independent of whether the client bound wp_linux_drm_syncobj_v1. Best-effort:
// absence degrades to the status-quo implicit path (see compositor_present.cpp).
if (renderer->features.timeline && drm_fd >= 0) {
present_signal_timeline = wlr_drm_syncobj_timeline_create(drm_fd);
present_timeline_supported = (present_signal_timeline != nullptr);
if (present_timeline_supported) {
GOGGLES_LOG_INFO("Compositor: composite-write signal timeline enabled");
} else {
GOGGLES_LOG_WARN(
"Compositor: signal timeline creation failed; implicit signal sync fallback");
}
} else {
GOGGLES_LOG_WARN("Compositor: renderer lacks features.timeline; composite-write fence "
"export unavailable (implicit signal sync fallback)");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Stale timeline support flag 🐞 Bug ☼ Reliability

CompositorState::create_compositor() only updates present_timeline_supported when
(renderer->features.timeline && drm_fd >= 0) is true, while teardown() clears
present_signal_timeline without resetting the flag. If the compositor is stopped/started and the
second init can’t create a timeline, render_surface_to_frame() may treat timeline support as
enabled and pass/use a null present_signal_timeline, risking a crash.
Agent Prompt
### Issue description
`present_timeline_supported` can become stale across compositor restarts: `teardown()` nulls `present_signal_timeline` but does not reset `present_timeline_supported`, and `create_compositor()` only assigns `present_timeline_supported` inside one branch. This can leave `present_timeline_supported == true` while `present_signal_timeline == nullptr`, which is then used in `render_surface_to_frame()`.

### Issue Context
`CompositorServer` exposes public `start()`/`stop()` and `start()` re-runs the initialization sequence on the same `CompositorState` instance.

### Fix Focus Areas
- src/compositor/compositor_core.cpp[186-216]
- src/compositor/compositor_core.cpp[393-411]
- src/compositor/compositor_present.cpp[657-667]

### Implementation notes
- In `create_compositor()`, proactively reset the composite-write timeline state before feature checks:
  - `present_timeline_supported = false;`
  - `present_signal_timeline = nullptr;` (and/or unref any existing timeline defensively)
  - optionally reset `present_signal_point = 0` when creating a new timeline.
- In the `else` path (no support), explicitly set `present_timeline_supported = false`.
- In `teardown()`, also set `present_timeline_supported = false` (and optionally reset `present_signal_point`).
- In `render_surface_to_frame()`, harden the condition to avoid null use (e.g., require both `present_timeline_supported` and `present_signal_timeline != nullptr`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…ordering

When a client did not bind wp_linux_drm_syncobj_v1 (basic GL/SW clients,
many XWayland), the compositor exported each frame's DMA-BUF to the goggles
Vulkan consumer with an empty sync_fd and published it anyway. The consumer's
submit then had no primitive ordering the foreign compositor device's
composite write before goggles sampled the buffer, so under GPU contention it
could read a torn/partial/stale frame (P5-1). When a fence was exported it
came from the root surface's acquire timeline -- the client's input point,
not the composite output write -- and the comment claiming otherwise was
inaccurate (P5-2).

Have the compositor own a composite-OUTPUT signal timeline, pass it into the
present buffer pass when renderer->features.timeline is supported, and export
the materialized point as the frame's sync_fd. The consumer path is unchanged:
the existing eSyncFd/eTemporary import now waits on the correct composite-write
dma_fence regardless of whether the client uses explicit sync. When the
renderer lacks timeline support, behavior degrades to the prior implicit-sync
path (strictly no worse than before).

Builds clean; full test suite passes 12/12. On AMD/radeonsi GLES2 the gate is
active and the fence is exported on every frame (0 not-materialized over 29
frames). Tear-free-under-contention validation (RenderDoc/pixel-diff) remains
for hardware testing.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
docs/policies/liveness.md (1)

9-23: ⚡ Quick win

Consider aligning terminology with main policy document.

This rule uses "shader pipeline" terminology (lines 12, 16, 20) while the corresponding rule in docs/project_policies.md (lines 363-375) uses "filter chain" and "adapters". For example:

  • liveness.md line 12: "active shader pipeline"
  • project_policies.md line 364: "active_slot always contains a valid, recordable filter chain"

The concepts are the same, but inconsistent naming across policy documents could cause confusion during code review or self-checks.

Suggested terminology alignment

Consider updating this file to use "filter chain" consistently with the main policy document, or alternatively update both documents to standardize on one term. "Filter chain" appears more specific to the codebase architecture (FilterChainController, filter chain slots, etc.).

Example:

-**Safety:** The active shader pipeline must always be recordable (or empty).
+**Safety:** The active filter chain must always be recordable (or empty).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/policies/liveness.md` around lines 9 - 23, Update LIVE.render.001 to use
the same terminology as the main policy: replace "shader pipeline" and related
phrases in the liveness.md text with "filter chain" (or whichever term you
choose to standardize) and ensure references align with the main-doc identifiers
like "active_slot" and component names such as FilterChainController/filter
chain slots; alternatively, change the main policy to match "shader pipeline"
but make the change consistently in both LIVE.render.001 and the main policy
(project_policies) so that symbols like "active_slot", "filter chain", and any
mentions of adapters/slots are identical across documents.
docs/policies/boundary.md (1)

65-77: ⚖️ Poor tradeoff

Enforcement gap acknowledged but not addressed.

The "Escapes" section explicitly states that semgrep only catches one specific pattern and "all other unchecked results escape the hard gate," yet the rule doesn't propose a solution. Consider either: (a) expanding the static analysis coverage with additional patterns, (b) requiring a wrapper type that forces explicit handling (e.g., [[nodiscard]] on a result wrapper), or (c) documenting this as a known limitation with manual review dependency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/policies/boundary.md` around lines 65 - 77, The policy BOUND.render.002
notes semgrep only detects the static_cast<void>(waitIdle()) pattern leaving
other vk::Result returns unchecked; address this by updating the policy to
require explicit handling for all vk::Result returns: either (1) expand static
analysis patterns to include other common call shapes (e.g., direct calls
returning vk::Result, assignments, chained calls), (2) mandate using a wrapper
or type annotated with [[nodiscard]] (or a Result<T> wrapper) so the compiler
enforces handling, or (3) add a clear “known limitation” section instructing
reviewers to manually verify every use of vk::Result and to prefer VK_TRY(call,
code, msg) for propagation; reference BOUND.render.002, vk::Result, VK_TRY and
the static_cast<void>(waitIdle()) example when you modify the doc to implement
one of these remedies.
docs/policies/sync.md (1)

51-63: ⚡ Quick win

Silent message drop may lose critical configuration.

The rule specifies "Silent drop on overflow (no deadlock risk)" for the control channel, but doesn't address how the sender detects dropped messages or how critical configuration changes (e.g., target FPS, cursor visibility) are handled if lost. Consider adding guidance on either: (a) message prioritization (critical config vs. ephemeral input), (b) sender-side feedback mechanism for dropped messages, or (c) making critical settings use a separate reliable channel. The current specification prevents deadlock but may cause user-visible configuration loss under load.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/policies/sync.md` around lines 51 - 63, The policy
SYNC.compositor-main.002 currently mandates a bounded, non-blocking control
channel with silent drop on overflow but omits guidance for detecting or
preserving critical configuration updates; update the policy text for
SYNC.compositor-main.002 to require either (a) prioritized messages (mark
messages like "target FPS" and "cursor visibility" as high-priority and ensure
they preempt or reserve capacity), (b) sender-side feedback (a lightweight
ACK/NACK or drop-counter so senders can detect dropped control messages and
retry), or (c) a separate reliable path for critical settings (a small reliable,
bounded channel or atomic with defined memory ordering for config updates), and
add a short note describing which messages must be considered “critical” and how
implementers should choose among prioritization, feedback, or separate reliable
channel to avoid silent loss.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/policies/boundary.md`:
- Around line 51-63: Update BOUND.render.001 to explicitly require that the
compositor validates plane count before exporting ExternalImageFrame and
returns/logs an error if the buffer is multi-plane: in the production path where
ExternalImageFrame is created (the compositor export boundary), check the
frame's plane count and reject/export-fail multi-plane DMA-BUFs, include the
validation failure in logs and in the error/Result returned to the consumer;
keep the ownership semantics (use UniqueFd::dup() for fd and sync_fd when
present) and document that consumers should still guard against receiving
multi-plane buffers but can rely on compositor validation per BOUND.render.001.

In `@docs/policies/process.md`:
- Around line 23-35: Update PROC.app.002 to explicitly state the ownership
transfer for compiled shader artifacts: clarify that JobSystem threads may
perform compilation and produce intermediate bytecode/IR, but the construction
of final Vulkan objects (e.g., VkShaderModule, VkPipeline) and any Vulkan
device/state access must occur on the main thread (the sole Vulkan submitter);
require the handoff to happen over the typed message channel mechanism
(JobSystem -> Main thread) and cross-reference RES.render.003 (hot-reload
lifecycle) and SYNC.compositor-main for the handoff/synchronization pattern so
readers know where to find the lifecycle and messaging details.

In `@docs/policies/resource.md`:
- Around line 37-51: Update RES.render.003 to remove the contradiction between
Safety and Free: either (A) move "Async mechanism (futures, continuations, job
system)" from Free into Locked and state that continuation-based async is
required (adjust Safety to assert "must use continuation-based async; disallow
ad-hoc active/pending/retired state machines"), or (B) soften Safety to warn
about pitfalls of ad-hoc state-machine/manual frame counting (describe specific
risks: blocking render loop, retire-timing bugs) while leaving Async mechanism
in Free; update the Safety paragraph and Free list accordingly and reference
RES.render.003, the "Safety" and "Free" sections in the doc to ensure wording is
consistent.

---

Nitpick comments:
In `@docs/policies/boundary.md`:
- Around line 65-77: The policy BOUND.render.002 notes semgrep only detects the
static_cast<void>(waitIdle()) pattern leaving other vk::Result returns
unchecked; address this by updating the policy to require explicit handling for
all vk::Result returns: either (1) expand static analysis patterns to include
other common call shapes (e.g., direct calls returning vk::Result, assignments,
chained calls), (2) mandate using a wrapper or type annotated with [[nodiscard]]
(or a Result<T> wrapper) so the compiler enforces handling, or (3) add a clear
“known limitation” section instructing reviewers to manually verify every use of
vk::Result and to prefer VK_TRY(call, code, msg) for propagation; reference
BOUND.render.002, vk::Result, VK_TRY and the static_cast<void>(waitIdle())
example when you modify the doc to implement one of these remedies.

In `@docs/policies/liveness.md`:
- Around line 9-23: Update LIVE.render.001 to use the same terminology as the
main policy: replace "shader pipeline" and related phrases in the liveness.md
text with "filter chain" (or whichever term you choose to standardize) and
ensure references align with the main-doc identifiers like "active_slot" and
component names such as FilterChainController/filter chain slots; alternatively,
change the main policy to match "shader pipeline" but make the change
consistently in both LIVE.render.001 and the main policy (project_policies) so
that symbols like "active_slot", "filter chain", and any mentions of
adapters/slots are identical across documents.

In `@docs/policies/sync.md`:
- Around line 51-63: The policy SYNC.compositor-main.002 currently mandates a
bounded, non-blocking control channel with silent drop on overflow but omits
guidance for detecting or preserving critical configuration updates; update the
policy text for SYNC.compositor-main.002 to require either (a) prioritized
messages (mark messages like "target FPS" and "cursor visibility" as
high-priority and ensure they preempt or reserve capacity), (b) sender-side
feedback (a lightweight ACK/NACK or drop-counter so senders can detect dropped
control messages and retry), or (c) a separate reliable path for critical
settings (a small reliable, bounded channel or atomic with defined memory
ordering for config updates), and add a short note describing which messages
must be considered “critical” and how implementers should choose among
prioritization, feedback, or separate reliable channel to avoid silent loss.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba93e48c-6e6d-445d-b480-894c81651a18

📥 Commits

Reviewing files that changed from the base of the PR and between 9532e82 and 7ebcd5c.

📒 Files selected for processing (11)
  • ROADMAP.md
  • docs/policies/boundary.md
  • docs/policies/liveness.md
  • docs/policies/process.md
  • docs/policies/resource.md
  • docs/policies/sync.md
  • docs/project_policies.md
  • filter-chain
  • src/compositor/compositor_core.cpp
  • src/compositor/compositor_present.cpp
  • src/compositor/compositor_state.hpp

Comment thread docs/policies/boundary.md
Comment on lines +51 to +63
### BOUND.render.001 — DMA-BUF frame ownership transfer

**Guard:** Modifying `ExternalImageFrame`, its production in the compositor, or its consumption in the render pipeline.
**Safety:** DMA-BUF handle ownership transfers via `UniqueFd::dup()` — producer retains its copy, consumer owns the duplicate. `sync_fd` (if present) transfers with the frame via the same dup mechanism. Receiver must not assume `sync_fd` is always present. Single-plane DMA-BUF only.

<details>

**Violation:** Double-close on DMA-BUF fd (shared instead of duplicated). Vulkan import fails on multi-plane buffer.
**Escapes:** Ownership transfer is semantic — no tool verifies that `dup()` is used rather than raw fd copy.
**Locked:** Ownership model (dup, not transfer). Single-plane requirement.
**Free:** Adding fields to the frame type. Internal rendering that produces the buffer.

</details>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Clarify multi-plane DMA-BUF handling.

The rule states "Single-plane DMA-BUF only" as a locked constraint but doesn't specify whether the compositor should validate plane count before export or whether the consumer should reject multi-plane imports. Given that Vulkan import can fail on multi-plane buffers (per the Violation note), consider adding explicit validation at the production boundary with logging/error propagation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/policies/boundary.md` around lines 51 - 63, Update BOUND.render.001 to
explicitly require that the compositor validates plane count before exporting
ExternalImageFrame and returns/logs an error if the buffer is multi-plane: in
the production path where ExternalImageFrame is created (the compositor export
boundary), check the frame's plane count and reject/export-fail multi-plane
DMA-BUFs, include the validation failure in logs and in the error/Result
returned to the consumer; keep the ownership semantics (use UniqueFd::dup() for
fd and sync_fd when present) and document that consumers should still guard
against receiving multi-plane buffers but can rely on compositor validation per
BOUND.render.001.

Comment thread docs/policies/process.md
Comment on lines +23 to +35
### PROC.app.002 — Thread ownership is structural

**Guard:** Modifying which thread creates, accesses, or destroys resources. Adding new thread-affine operations. Introducing new cross-thread communication.
**Safety:** Each thread owns a set of types that cannot be accessed from other threads. Main thread owns: SDL window, Vulkan submission, ImGui, render loop. Compositor thread owns: wlroots event loop, all `wlr_*` objects, Wayland protocol handling. JobSystem threads own: async shader compilation only (no Vulkan submission, no wlroots access). Cross-thread communication uses typed message channels, not shared mutable state protected by convention.

<details>

**Violation:** Data race on wlroots objects from main thread. Vulkan submission from compositor thread. Callback registered on one thread executing on another without synchronization.
**Escapes:** Thread affinity is a runtime property. No static tool can verify that a function is only called from a specific thread. The goal is to make thread boundaries visible in the architecture (separate types per thread, messages at boundaries), not just documented in comments.
**Locked:** Main thread as sole Vulkan submitter. Compositor thread as sole wlroots accessor. Communication through typed channels only.
**Free:** Which specific types each thread owns. Internal threading within a subsystem (if fully encapsulated).

</details>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Clarify ownership transfer for compiled shader artifacts.

The rule states that JobSystem threads own "async shader compilation only" but doesn't specify the ownership transfer mechanism for compiled Vulkan pipeline objects back to the main thread (which owns Vulkan submission per this rule). Cross-reference with RES.render.003, which describes hot-reload lifecycle but doesn't explicitly address cross-thread ownership transfer. Consider adding a note about which thread constructs the final Vulkan objects or referencing SYNC.compositor-main rules for the handoff mechanism.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/policies/process.md` around lines 23 - 35, Update PROC.app.002 to
explicitly state the ownership transfer for compiled shader artifacts: clarify
that JobSystem threads may perform compilation and produce intermediate
bytecode/IR, but the construction of final Vulkan objects (e.g., VkShaderModule,
VkPipeline) and any Vulkan device/state access must occur on the main thread
(the sole Vulkan submitter); require the handoff to happen over the typed
message channel mechanism (JobSystem -> Main thread) and cross-reference
RES.render.003 (hot-reload lifecycle) and SYNC.compositor-main for the
handoff/synchronization pattern so readers know where to find the lifecycle and
messaging details.

Comment thread docs/policies/resource.md
@K1ngst0m K1ngst0m force-pushed the fix/p5-composite-write-fence branch from 7ebcd5c to 44a526c Compare June 8, 2026 14:49
The committed lock referenced libxkbcommon 1.13.1, which no longer matches
the manifest resolution, so `pixi install --locked` (run by CI's Setup Pixi
on a clean checkout) reports "lock-file not up-to-date with the workspace"
and every CI job fails before building. This pre-exists on main and is
unrelated to the code change; included here so this PR's CI can pass.
Regenerated with the CI-pinned pixi v0.65.0; only a transitive patch bump.
@K1ngst0m K1ngst0m merged commit 487ff5b into main Jun 8, 2026
5 checks passed
@K1ngst0m K1ngst0m deleted the fix/p5-composite-write-fence branch June 8, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant