Streamline the qemu product flow and export sparse review bundles#3
Streamline the qemu product flow and export sparse review bundles#3
Conversation
Session deletion kept qemu artifact directories on disk, qemu image prep could leak docker-backed storage, and guest-runtime lifecycle logic had drifted between the library and binary entrypoint. This change centralizes the runtime implementation in the library, reclaims session-owned runtime directories on teardown, adds safe marker-based startup janitoring plus an explicit reclaim endpoint, and makes qemu prep keep its temporary storage inside the tracked workdir so it can be cleaned deterministically. Constraint: Preserve qemu as the product path while keeping xvfb and display sessions honest Constraint: Avoid new dependencies while making cleanup safe enough for app-local storage Rejected: Docker volume prune as the primary fix | too broad and unsafe for unrelated local projects Rejected: Storage hotfix without lifecycle consolidation | likely to regress because lib.rs and main.rs had drifted Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep runtime/cache/exports ownership explicit and do not reintroduce session-owned files outside runtime storage Tested: cargo test -p guest-runtime --quiet; bun test apps/control-plane/src/server.desktop-app.test.ts Not-tested: Full end-to-end qemu product boot against a real docker/qemu environment
The happy path still existed in the backend, but the default surface started from provider/debug knobs instead of the session -> task -> live view -> cleanup loop an agent should infer first. This change promotes the qemu-first start-and-watch workflow, adds explicit delete and stale-storage reclaim actions, wires optional desktop activation for installed-app oversight, updates SDK/docs around the canonical contract, and keeps advanced/debug controls available without making them primary. Constraint: Preserve honest xvfb fallback semantics while keeping qemu product as the canonical live path Constraint: Keep the packaged desktop resources in sync with the source web UI and control-plane server Rejected: Full UI redesign before storage and contract ergonomics were in place | too broad for the approved plan Rejected: Desktop activation as a required step in the agent contract | should remain optional additive ergonomics Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep the default path start-and-watch first; new debug controls belong under advanced surfaces, not the main flow Tested: bun run typecheck; bun test apps/control-plane/src/server.desktop-app.test.ts apps/web-ui/public/provider-default.test.js crates/desktop-app/resources/control-plane/ui/provider-default.test.js packages/ts-sdk/src/index.test.ts; bun run --filter @acu/control-plane build; bun run sync:desktop-resources; cargo test -p guest-runtime --quiet; bun run test; bun run lint Not-tested: Manual desktop activation against a real installed desktop bundle outside the test harness
The first reclaim pass handled current runtime storage and the new cache layout, but it missed prep workdirs stranded under the old runtime-scoped `_qemu_images/_build` path. This follow-up teaches reclaim to scan both layouts, reports the legacy build root explicitly, and documents the remaining limitation around detached anonymous Docker volumes that inspectors cannot attribute safely after their containers are gone. Constraint: Reclaim must stay inspectors-scoped and avoid deleting unrelated Docker resources Rejected: Global docker volume prune from the reclaim flow | too destructive for unrelated local workloads Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep reclaim logic compatible with both current and legacy inspectors storage layouts until old runtime caches are no longer in the wild Tested: cargo test -p guest-runtime --quiet; bun run test; bun run lint Not-tested: Real-world detached anonymous Docker volume recovery beyond inspectors-attributable state
Taskers is the best local app for exercising inspectors qemu product mode because it is a real user-facing GUI app with machine-verifiable companion tooling. The existing proof already launched Taskers and created a workspace, but it was pinned to an older bundle path and did not prove visible desktop changes strongly enough. This update auto-selects the newest local bundle when not overridden, records baseline/launch/before/after screenshots with hashes, verifies visible desktop changes, exercises desktop mouse input, and documents Taskers as the recommended qemu dogfood target. Constraint: Keep the proof local and reuse the existing Taskers bundle under ../taskers without adding external dependencies Constraint: Preserve machine-verifiable proof via taskersctl while strengthening visible desktop evidence Rejected: Switching the proof target to a generic GNOME utility | weaker real-world coverage and no companion machine proof Rejected: Keeping the old fixed Taskers bundle path | too brittle as local bundles evolve Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep the qemu dogfood target focused on apps that are both visibly interactive and machine-verifiable; Taskers remains the default until a better local target exists Tested: python3 -m py_compile scripts/run-taskers-qemu-dogfood.py python/sdk/client.py; ACU_BASE_URL=http://127.0.0.1:3100 python3 scripts/run-taskers-qemu-dogfood.py; ACU_BASE_URL=http://127.0.0.1:3100 python3 scripts/run-qemu-live-view-demo.py; bun run test; bun run lint Not-tested: Visual inspection of the saved screenshots by a human outside the captured artifact hashes and machine proof
Add review_recording session metadata, a sparse guest-runtime review ledger, control-plane export plumbing, and operator UI status so humans can retain useful qemu evidence without default video capture. Constraint: Review capture must stay truthful to the qemu product action plane Constraint: Runtime storage remains ephemeral unless explicitly exported Rejected: Continuous video capture | too storage-heavy for the requested goal Rejected: Raw noVNC protocol replay | higher complexity with worse review ergonomics Confidence: medium Scope-risk: moderate Reversibility: clean Directive: Keep guest-runtime as the sole review timeline writer; control-plane should append via requests only Tested: cargo test -p desktop-core -p guest-runtime --lib Tested: cargo test -p guest-runtime --test qemu_bridge Tested: cd apps/control-plane && bun run build && node --test dist/**/*.test.js Tested: cd apps/web-ui && node --test public/**/*.test.js Not-tested: End-to-end export against a live qemu product guest
Document the new review_recording session summary and review export route, and update the qemu proof scripts so they keep exported review bundles as part of later-review evidence. Constraint: Proof output should confirm exported bundles survive session teardown Rejected: Document default video capture | contradicts the storage-first design Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep v1 docs explicit that review export is sparse timeline evidence, not video replay Tested: python -m py_compile python/sdk/client.py scripts/run-qemu-live-view-demo.py scripts/run-taskers-qemu-dogfood.py Not-tested: Live execution of the qemu proof scripts against a real guest
Refresh the packaged control-plane bundle after the new qemu review export and operator-surface changes so the desktop app ships the same behavior as the source server/UI. Constraint: Packaged desktop resources are copied from built control-plane assets Rejected: Hand-editing bundled output | too easy to drift from the source build Confidence: high Scope-risk: narrow Reversibility: clean Directive: Regenerate desktop resources via the sync script after control-plane or web-ui changes Tested: bun run sync:desktop-resources Tested: node --test crates/desktop-app/resources/control-plane/ui/live-view.test.js Tested: node --check crates/desktop-app/resources/control-plane/ui/app.js Tested: node --check crates/desktop-app/resources/control-plane/ui/live-view.js Not-tested: End-to-end packaged desktop app launch after the bundle refresh
Rewrite exported timeline screenshot refs into the durable export bundle, serialize per-session review writes behind a session lock, and tighten the docs/proof scripts to reflect the storage-first export flow. Constraint: Exported bundles must remain reviewable after runtime teardown Constraint: Canonical review sequence numbers must stay stable under concurrent writes Rejected: Leaving runtime screenshot paths in exported timelines | exported bundles broke after session deletion Rejected: Relying on manifest counters without write serialization | could corrupt ordering under concurrent events Confidence: high Scope-risk: moderate Reversibility: clean Directive: Any future review event or export path must go through the session review lock before touching the manifest or timeline Tested: cargo test -p guest-runtime --lib --test qemu_bridge Tested: bun run build Tested: bun run lint Tested: bun run test Tested: python -m py_compile python/sdk/client.py scripts/run-qemu-live-view-demo.py scripts/run-taskers-qemu-dogfood.py Not-tested: Live qemu product export against a real VM after the concurrency fix
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69a253297b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Only failed review events or explicit error statuses should trigger temporary postmortem retention. Successful bridge_state_changed events must keep the bundle ephemeral so normal qemu teardowns still reclaim runtime storage immediately. Constraint: Successful qemu bridge transitions still emit review events but must not pin runtime storage Rejected: Keep bridge_state_changed as a postmortem trigger | leaks normal session runtime dirs until TTL expiry Confidence: high Scope-risk: narrow Reversibility: clean Directive: Treat postmortem retention as an error-path signal; do not widen it to healthy lifecycle milestones without a bounded retention design review Tested: cargo test -p guest-runtime --lib Tested: cargo test -p guest-runtime --test qemu_bridge Not-tested: Full workspace test/lint rerun after this narrow Rust-only follow-up
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df14b55834
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Restrict startup cleanup to exited image-prep containers, persist liveness markers for prepare-build workdirs, and only report reclaim success after a remove actually succeeds so operators are not misled. Constraint: Running ensure-image jobs may overlap with guest-runtime startup and reclaim requests Rejected: Cleaning every acu-image-prep-* container on startup | kills active image builds Rejected: Reporting every reclaim candidate as removed | can claim success on failed deletes Confidence: high Scope-risk: narrow Reversibility: clean Directive: Any future prep-work cleanup must prove both liveness safety and truthful operator reporting Tested: cargo test -p guest-runtime --test qemu_bridge Tested: cargo test -p guest-runtime --lib Tested: python -m py_compile scripts/qemu_guest_assets.py Not-tested: Full workspace build/lint/test rerun after this Rust/Python-only follow-up
Summary
review_recordingmetadata andPOST /api/sessions/:id/review/exportTesting
bun run buildbun run lintbun run testpython -m py_compile python/sdk/client.py scripts/run-qemu-live-view-demo.py scripts/run-taskers-qemu-dogfood.pycargo test -p guest-runtime --lib --test qemu_bridge