Skip to content

ovphysx 0.4.3 + ovrtx coexistence: clone migration, init-order hoist, pose forwarding#1

Open
EricArnoldNVIDIA wants to merge 4 commits intopbarejko/ovphysx-ovrtxfrom
erarnold/ovphysx-ovrtx-coexist
Open

ovphysx 0.4.3 + ovrtx coexistence: clone migration, init-order hoist, pose forwarding#1
EricArnoldNVIDIA wants to merge 4 commits intopbarejko/ovphysx-ovrtxfrom
erarnold/ovphysx-ovrtx-coexist

Conversation

@EricArnoldNVIDIA
Copy link
Copy Markdown
Collaborator

Summary

Gets presets=ovphysx,ovrtx_renderer,rgb running end-to-end on
Isaac-Cartpole-Camera-Presets-Direct-v0 against ovphysx 0.4.3 +
ovrtx 0.3.0.307110. Three independent fixes plus a companion design
doc:

  • a7ecbd0de18RenderContext.cleanup() runs before
    physics_manager.close() so OVRTX releases its native objects before
    ovphysx tears down their shared Carbonite (the inverted order was
    crashing the second teardown).
  • f018386acda — Two changes that travel together:
    • physx.clone() was removed in ovphysx 0.4.3.
      InteractiveScene now routes ovphysx through USD-side cloning
      (clone_usd=True, clone_physics=False); OvPhysxManager._warmup_and_load
      raises a clear error if a pre-0.4.3 wheel and the legacy code path coincide.
    • Add BaseRenderer.early_init(). OVRTXRenderer overrides it to
      construct the C++ Renderer up front so OVRTX claims Carbonite before
      OvPhysxManager._warmup_and_load. RenderContext.early_init_all()
      walks registered backends; InteractiveScene.early_init_renderers()
      pre-walks Camera sensors so early_init has something to act on;
      DirectRLEnv._init_sim calls it between _setup_scene (where the
      cartpole task adds its tiled camera) and sim.reset(). Without this
      hoist, createRTXRenderer SIGSEGVs because ovrtx does no
      coexistence checks at construction time.
  • 9039f0adb1a — Object pose forwarding for ovphysx.
    OVRTXRenderer._setup_object_bindings falls through to an OvPhysx
    path when the Newton scene-data provider returns no model: walks the
    USD for PhysicsRigidBodyAPI prims under /World/envs/env_*,
    creates a single RIGID_BODY_POSE tensor binding, binds the same
    flat list to ovrtx omni:xform. update_transforms reads the pose
    tensor on GPU into a pre-allocated [N, 7] warp array, runs
    sync_ovphysx_pose_to_mat44d_kernel to construct ovrtx-format
    mat44d rows, then wp.copys into the ovrtx attribute mapping.
    Zero host roundtrip per frame.
  • 560ad29e2a1source/isaaclab_ov/docs/ovphysx_coexist_DESIGN.md,
    per-item rationale + caveats + future work.

This stacks on top of pbarejko/ovphysx-ovrtx and assumes ovphysx 0.4.3
(specifically dev/erarnold/ovstage-attach commit c728f7e740, which
adds cross-device staging in ovphysx_read_tensor_binding — that's
what kills the original body_com_pose device-mismatch with no
IsaacLab-side workaround needed). With ovphysx 0.4.2 the legacy
physx.clone() path is rejected by the new defensive check; the
defense is intentional.

Tracking: OMPE-88037

Test plan

  • ./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-Camera-Presets-Direct-v0 --num_envs=32 --max_iterations=2 --headless --enable_cameras presets=ovphysx,ovrtx_renderer,rgb against ovphysx 0.4.3+dev.erarnold.ovstage.attach.c728f7e740 + ovrtx 0.3.0.307110. Result: 2 epochs, ~670 fps step, 16.59s end-to-end, exit 0. Renderer logs OvPhysx: created RIGID_BODY_POSE binding for 96 ovrtx prims (shape=(96, 7)) and no Failed to update object transforms warnings across the run.
  • Visual confirmation that the cart and pole actually animate in the rendered frames (covered by absence of update warnings + non-trivial RGB output, but not a frame-by-frame visual check yet — --video capture is the easy way).
  • Same repro on a non-ovphysx backend (presets=physx,... and presets=newton_mjwarp,...) to confirm no regression. The cleanup-order change and the renderer-hoist machinery default to no-op for backends that don't override BaseRenderer.early_init(); the ovphysx-only branches in InteractiveScene and OVRTXRenderer are gated on physics_backend.startswith("ovphysx") and the Newton-model-None check respectively.
  • Manager-based env types (the renderer-hoist early_init_renderers() call lives in DirectRLEnv._init_sim only — manager envs that create cameras after InteractiveScene.__init__ will need the same hook). Captured as a caveat in the DESIGN doc.

Notes for review

  • The OvPhysx pose binding currently uses RIGID_BODY_POSE for all
    bodies (including articulation links). Worked for cartpole's layout
    (32 envs × 3 bodies = 96 prims). If a future task hits a layout
    where ovphysx rejects an articulation link in a RIGID_BODY_POSE
    binding, the fallback should switch to per-articulation
    ARTICULATION_LINK_POSE bindings. Not implemented because not
    needed yet; called out in the DESIGN doc.
  • The run prints repeated Warning: Possible version incompatibility. Attempting to load omni::fabric::IStageReaderWriter with version v0.16 against v0.15. Non-fatal but worth flagging upstream — the
    kit/sdk integration sample's CMakeLists.txt calls out the same
    IStageReaderWriter v0.16+ requirement.

OVRTX shares a Carbonite framework with ovphysx; per the kit/sdk and
ovstage working samples, the renderer-side native objects must release
before the physics-side teardown runs, otherwise the second teardown
crashes on already-freed plugin state. SimulationContext.clear_instance
released physics first and let cameras GC last, which inverted the
required order.

Add RenderContext.cleanup() that walks registered backends and invokes
BaseRenderer.cleanup(None) on each, then call it from clear_instance
before physics_manager.close(). Per-camera Camera.__del__ cleanup stays
in place as a safety net but becomes a no-op once the central cleanup
has run. No behavior change for non-ovrtx backends.

Companion design notes for the deferred items (renderer hoist,
_setup_object_bindings on ovphysx, physx.clone migration) at
source/isaaclab_ov/docs/ovphysx_coexist_DESIGN.md.

Tracking: OMPE-88037
…nit hoist

Two stacked fixes that, together with the earlier render-then-physics
teardown change, get presets=ovphysx,ovrtx_renderer,rgb running
end-to-end against ovphysx 0.4.3 + ovrtx 0.3.0.307110.

Fix 1 — physx.clone() removed in ovphysx 0.4.3.
InteractiveScene routed the ovphysx backend through clone_usd=False so
that physx.clone() in OvPhysxManager._warmup_and_load could populate
env_1..N at the physics layer. The 0.4.3 wheel removed that public API
in favor of ovstage_clone_subtree on an attached ovstage.Stage. Until
the ovstage bridge is wired in (see source/isaaclab_ov/docs/
ovphysx_coexist_DESIGN.md), switch ovphysx onto USD-side cloning:
clone_usd=True and clone_physics=False. _warmup_and_load now raises a
clear error if a pre-0.4.3 wheel and the legacy code path coincide,
rather than letting AttributeError bubble through Camera init.

Fix 2 — OVRTX must construct its native Renderer before ovphysx.
ovrtx and ovphysx share a Carbonite framework; ovrtx does no
coexistence checks at construction time and SIGSEGVs inside
createRTXRenderer if ovphysx (or any other Carbonite owner) has
already loaded. IsaacLab today constructs OVRTX lazily in
Camera._initialize_impl (which fires on PHYSICS_READY, after
OvPhysxManager._warmup_and_load), inverting the required order.

Add BaseRenderer.early_init() (default no-op) and have OVRTXRenderer
override it to construct the C++ Renderer up front; OVRTXRenderer.
initialize() now skips renderer construction if early_init already
ran. RenderContext.early_init_all() walks registered backends, and
InteractiveScene.early_init_renderers() pre-walks Camera sensors to
register their renderer cfg with the context so early_init has
something to act on. DirectRLEnv calls scene.early_init_renderers()
between _setup_scene (which adds tiled cameras) and sim.reset() so
the OVRTX Renderer is constructed before OvPhysxManager constructs
its native instance.

Smoke run on the OMPE-88037 cartpole repro (32 envs, 2 epochs, 100x100
RGB tiled camera): 16.34s end-to-end, training ran at ~700 fps, exit 0.
Object transforms in the rendered frame are still static — Item B in the
DESIGN doc (OVRTXRenderer.update_transforms wiring to ovphysx pose
bindings) is the next piece.

Tracking: OMPE-88037
OVRTXRenderer's object-binding path was Newton-only: it asked
SceneDataProvider.get_newton_model() and bailed when None (which is
always the case under an OvPhysx physics manager). Result: with
presets=ovphysx,ovrtx_renderer the rendered frames had moving cameras
but static objects.

Add an OvPhysx fallback that:

1. Walks the USD stage for prims with PhysicsRigidBodyAPI under
   /World/envs/env_*, filters out the camera and any ground plane.
2. Creates an ovphysx RIGID_BODY_POSE tensor binding over those paths
   (CUDA-resident, [N, 7] = px,py,pz,qx,qy,qz,qw).
3. Binds the same flat list of prim paths to ovrtx omni:xform with
   PrimMode.EXISTING_ONLY and writes omni:resetXformStack=True so the
   matrix is interpreted as the world transform regardless of parent
   xforms — same pattern as the camera binding in
   OVRTXRenderer.initialize.

Per-frame update_transforms now reads the ovphysx pose tensor on GPU,
runs sync_ovphysx_pose_to_mat44d_kernel to construct ovrtx-format
mat44d rows (column-major rotation, translation in row 3, ``1.0`` at
``[3,3]`` — matches create_camera_transforms_kernel), then wp.copy's
into the ovrtx attribute mapping. Zero host roundtrip, single ovrtx
write per frame.

Smoke run on the OMPE-88037 cartpole repro (32 envs, 2 epochs, 100x100
RGB tiled camera, ovphysx 0.4.3 + ovrtx 0.3.0.307110): 16.59s, training
ran at ~670 fps, exit 0; renderer reports
"OvPhysx: created RIGID_BODY_POSE binding for 96 ovrtx prims" for
32 envs × 3 articulation links each.

Recipe credit: kit/sdk team (notes saved to /tmp/ovstage-answer.md).
The Newton path is unchanged; the fallback only fires when
get_newton_model() returns None.

Tracking: OMPE-88037
Rewrite source/isaaclab_ov/docs/ovphysx_coexist_DESIGN.md so that
PR reviewers can read it without external references:

- Drop ~/tmp/OMPE-88037-ovphysx-ovrtx-isaaclab.md and
  /tmp/ovstage-answer.md links (those are scratch files on the patch
  author's machine, not visible to reviewers).
- Inline the answers to the three open questions to the kit/sdk and
  ovstage teams that informed the design (init order, ovrtx pin,
  pose-forwarding recipe).
- Tag each item with the commit that landed it
  (a7ecbd0, f018386, 9039f0a).
- Note known caveats / future work per item, and the cross-cutting
  bits (Fabric IStageReaderWriter v0.16/v0.15 ABI warnings, the
  surviving os._exit(0) HACK in OvPhysxManager).

Tracking: OMPE-88037
# Carbonite before ovphysx constructs its native instance (sim.reset →
# OvPhysxManager._warmup_and_load), otherwise createRTXRenderer SIGSEGVs.
# No-op for backends without an early_init override.
self.scene.early_init_renderers()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably also need to add this to direct_marl_env.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants