Skip to content

Streamline the qemu product flow and export sparse review bundles#3

Merged
OneNoted merged 10 commits intomainfrom
qemu-product-review-bundles
Apr 15, 2026
Merged

Streamline the qemu product flow and export sparse review bundles#3
OneNoted merged 10 commits intomainfrom
qemu-product-review-bundles

Conversation

@OneNoted
Copy link
Copy Markdown
Owner

Summary

  • stop leaking qemu runtime and prep-image storage while making the default qemu product flow easier to drive
  • add storage-efficient qemu review recording with review_recording metadata and POST /api/sessions/:id/review/export
  • update the control plane, web and desktop operator surfaces, SDKs, docs, and proof scripts for the sparse review-bundle workflow

Testing

  • bun run build
  • bun run lint
  • bun run test
  • python -m py_compile python/sdk/client.py scripts/run-qemu-live-view-demo.py scripts/run-taskers-qemu-dogfood.py
  • cargo test -p guest-runtime --lib --test qemu_bridge

Session deletion kept qemu artifact directories on disk, qemu image prep could leak docker-backed storage, and guest-runtime lifecycle logic had drifted between the library and binary entrypoint. This change centralizes the runtime implementation in the library, reclaims session-owned runtime directories on teardown, adds safe marker-based startup janitoring plus an explicit reclaim endpoint, and makes qemu prep keep its temporary storage inside the tracked workdir so it can be cleaned deterministically.

Constraint: Preserve qemu as the product path while keeping xvfb and display sessions honest
Constraint: Avoid new dependencies while making cleanup safe enough for app-local storage
Rejected: Docker volume prune as the primary fix | too broad and unsafe for unrelated local projects
Rejected: Storage hotfix without lifecycle consolidation | likely to regress because lib.rs and main.rs had drifted
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep runtime/cache/exports ownership explicit and do not reintroduce session-owned files outside runtime storage
Tested: cargo test -p guest-runtime --quiet; bun test apps/control-plane/src/server.desktop-app.test.ts
Not-tested: Full end-to-end qemu product boot against a real docker/qemu environment
The happy path still existed in the backend, but the default surface started from provider/debug knobs instead of the session -> task -> live view -> cleanup loop an agent should infer first. This change promotes the qemu-first start-and-watch workflow, adds explicit delete and stale-storage reclaim actions, wires optional desktop activation for installed-app oversight, updates SDK/docs around the canonical contract, and keeps advanced/debug controls available without making them primary.

Constraint: Preserve honest xvfb fallback semantics while keeping qemu product as the canonical live path
Constraint: Keep the packaged desktop resources in sync with the source web UI and control-plane server
Rejected: Full UI redesign before storage and contract ergonomics were in place | too broad for the approved plan
Rejected: Desktop activation as a required step in the agent contract | should remain optional additive ergonomics
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep the default path start-and-watch first; new debug controls belong under advanced surfaces, not the main flow
Tested: bun run typecheck; bun test apps/control-plane/src/server.desktop-app.test.ts apps/web-ui/public/provider-default.test.js crates/desktop-app/resources/control-plane/ui/provider-default.test.js packages/ts-sdk/src/index.test.ts; bun run --filter @acu/control-plane build; bun run sync:desktop-resources; cargo test -p guest-runtime --quiet; bun run test; bun run lint
Not-tested: Manual desktop activation against a real installed desktop bundle outside the test harness
The first reclaim pass handled current runtime storage and the new cache layout, but it missed prep workdirs stranded under the old runtime-scoped `_qemu_images/_build` path. This follow-up teaches reclaim to scan both layouts, reports the legacy build root explicitly, and documents the remaining limitation around detached anonymous Docker volumes that inspectors cannot attribute safely after their containers are gone.

Constraint: Reclaim must stay inspectors-scoped and avoid deleting unrelated Docker resources
Rejected: Global docker volume prune from the reclaim flow | too destructive for unrelated local workloads
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep reclaim logic compatible with both current and legacy inspectors storage layouts until old runtime caches are no longer in the wild
Tested: cargo test -p guest-runtime --quiet; bun run test; bun run lint
Not-tested: Real-world detached anonymous Docker volume recovery beyond inspectors-attributable state
Taskers is the best local app for exercising inspectors qemu product mode because it is a real user-facing GUI app with machine-verifiable companion tooling. The existing proof already launched Taskers and created a workspace, but it was pinned to an older bundle path and did not prove visible desktop changes strongly enough. This update auto-selects the newest local bundle when not overridden, records baseline/launch/before/after screenshots with hashes, verifies visible desktop changes, exercises desktop mouse input, and documents Taskers as the recommended qemu dogfood target.

Constraint: Keep the proof local and reuse the existing Taskers bundle under ../taskers without adding external dependencies
Constraint: Preserve machine-verifiable proof via taskersctl while strengthening visible desktop evidence
Rejected: Switching the proof target to a generic GNOME utility | weaker real-world coverage and no companion machine proof
Rejected: Keeping the old fixed Taskers bundle path | too brittle as local bundles evolve
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep the qemu dogfood target focused on apps that are both visibly interactive and machine-verifiable; Taskers remains the default until a better local target exists
Tested: python3 -m py_compile scripts/run-taskers-qemu-dogfood.py python/sdk/client.py; ACU_BASE_URL=http://127.0.0.1:3100 python3 scripts/run-taskers-qemu-dogfood.py; ACU_BASE_URL=http://127.0.0.1:3100 python3 scripts/run-qemu-live-view-demo.py; bun run test; bun run lint
Not-tested: Visual inspection of the saved screenshots by a human outside the captured artifact hashes and machine proof
Add review_recording session metadata, a sparse guest-runtime review ledger,
control-plane export plumbing, and operator UI status so humans can retain
useful qemu evidence without default video capture.

Constraint: Review capture must stay truthful to the qemu product action plane
Constraint: Runtime storage remains ephemeral unless explicitly exported
Rejected: Continuous video capture | too storage-heavy for the requested goal
Rejected: Raw noVNC protocol replay | higher complexity with worse review ergonomics
Confidence: medium
Scope-risk: moderate
Reversibility: clean
Directive: Keep guest-runtime as the sole review timeline writer; control-plane should append via requests only
Tested: cargo test -p desktop-core -p guest-runtime --lib
Tested: cargo test -p guest-runtime --test qemu_bridge
Tested: cd apps/control-plane && bun run build && node --test dist/**/*.test.js
Tested: cd apps/web-ui && node --test public/**/*.test.js
Not-tested: End-to-end export against a live qemu product guest
Document the new review_recording session summary and review export route,
and update the qemu proof scripts so they keep exported review bundles as
part of later-review evidence.

Constraint: Proof output should confirm exported bundles survive session teardown
Rejected: Document default video capture | contradicts the storage-first design
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep v1 docs explicit that review export is sparse timeline evidence, not video replay
Tested: python -m py_compile python/sdk/client.py scripts/run-qemu-live-view-demo.py scripts/run-taskers-qemu-dogfood.py
Not-tested: Live execution of the qemu proof scripts against a real guest
Refresh the packaged control-plane bundle after the new qemu review export
and operator-surface changes so the desktop app ships the same behavior as
the source server/UI.

Constraint: Packaged desktop resources are copied from built control-plane assets
Rejected: Hand-editing bundled output | too easy to drift from the source build
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Regenerate desktop resources via the sync script after control-plane or web-ui changes
Tested: bun run sync:desktop-resources
Tested: node --test crates/desktop-app/resources/control-plane/ui/live-view.test.js
Tested: node --check crates/desktop-app/resources/control-plane/ui/app.js
Tested: node --check crates/desktop-app/resources/control-plane/ui/live-view.js
Not-tested: End-to-end packaged desktop app launch after the bundle refresh
Rewrite exported timeline screenshot refs into the durable export bundle,
serialize per-session review writes behind a session lock, and tighten the
docs/proof scripts to reflect the storage-first export flow.

Constraint: Exported bundles must remain reviewable after runtime teardown
Constraint: Canonical review sequence numbers must stay stable under concurrent writes
Rejected: Leaving runtime screenshot paths in exported timelines | exported bundles broke after session deletion
Rejected: Relying on manifest counters without write serialization | could corrupt ordering under concurrent events
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Any future review event or export path must go through the session review lock before touching the manifest or timeline
Tested: cargo test -p guest-runtime --lib --test qemu_bridge
Tested: bun run build
Tested: bun run lint
Tested: bun run test
Tested: python -m py_compile python/sdk/client.py scripts/run-qemu-live-view-demo.py scripts/run-taskers-qemu-dogfood.py
Not-tested: Live qemu product export against a real VM after the concurrency fix
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69a253297b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/guest-runtime/src/lib.rs Outdated
Only failed review events or explicit error statuses should trigger
temporary postmortem retention. Successful bridge_state_changed events
must keep the bundle ephemeral so normal qemu teardowns still reclaim
runtime storage immediately.

Constraint: Successful qemu bridge transitions still emit review events but must not pin runtime storage
Rejected: Keep bridge_state_changed as a postmortem trigger | leaks normal session runtime dirs until TTL expiry
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Treat postmortem retention as an error-path signal; do not widen it to healthy lifecycle milestones without a bounded retention design review
Tested: cargo test -p guest-runtime --lib
Tested: cargo test -p guest-runtime --test qemu_bridge
Not-tested: Full workspace test/lint rerun after this narrow Rust-only follow-up
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df14b55834

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/guest-runtime/src/lib.rs Outdated
Comment thread scripts/qemu_guest_assets.py
Comment thread crates/guest-runtime/src/lib.rs Outdated
Restrict startup cleanup to exited image-prep containers, persist liveness
markers for prepare-build workdirs, and only report reclaim success after a
remove actually succeeds so operators are not misled.

Constraint: Running ensure-image jobs may overlap with guest-runtime startup and reclaim requests
Rejected: Cleaning every acu-image-prep-* container on startup | kills active image builds
Rejected: Reporting every reclaim candidate as removed | can claim success on failed deletes
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future prep-work cleanup must prove both liveness safety and truthful operator reporting
Tested: cargo test -p guest-runtime --test qemu_bridge
Tested: cargo test -p guest-runtime --lib
Tested: python -m py_compile scripts/qemu_guest_assets.py
Not-tested: Full workspace build/lint/test rerun after this Rust/Python-only follow-up
@OneNoted OneNoted merged commit d4aaa1f into main Apr 15, 2026
6 checks passed
@OneNoted OneNoted deleted the qemu-product-review-bundles branch April 15, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant