Add teleop replay agent for non-interactive CI runs by rwiltz · Pull Request #5507 · isaac-sim/IsaacLab

rwiltz · 2026-05-05T22:22:57Z

Description

Adds a permanent, decoupled CI entry-point for replaying captured teleop
sessions against an Isaac Lab environment. Replaces the runtime patch the
teleop-cicd pipeline currently applies to
scripts/environments/teleoperation/teleop_se3_agent.py so the
user-journey script is no longer mutated at CI time.

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR adds a dedicated CI script (teleop_replay_agent.py) for replaying captured teleoperation sessions, along with an internal XCR replay driver module. The goal is to replace runtime patches that CI previously applied to the interactive teleop_se3_agent.py. The implementation is structurally sound but has several issues around error handling, missing module initialization, and potential race conditions.

Architecture Impact

New public script: scripts/environments/teleoperation/teleop_replay_agent.py becomes a CI entry point
New internal module: isaaclab_teleop.automation.xcr_replay - but the automation subpackage appears to lack an __init__.py, making imports fail
Dependencies: Relies on isaaclab.devices.openxr.remove_camera_configs and isaaclab.devices.teleop_device_factory.create_teleop_device - need to verify these exist in the codebase
Cross-module: Uses isaaclab_tasks.utils.parse_env_cfg which is a stable public API

Implementation Verdict

Significant concerns — Missing __init__.py will cause import failures, and several edge cases in the simulation loop need attention.

Test Coverage

The PR description claims tests were added ("I have added tests that prove my fix is effective or that my feature works") but no test files are included in the changeset. For a CI automation script, integration tests or at minimum a smoke test would be valuable. The XCR replay driver has complex async behavior that's difficult to test in isolation.

CI Status

No CI checks available yet — cannot verify the code actually runs in the target environment.

Findings

🔴 Critical: source/isaaclab_teleop/isaaclab_teleop/automation/xcr_replay.py — Missing __init__.py for automation subpackage
The new automation subpackage has no __init__.py. The import at teleop_replay_agent.py:108:

from isaaclab_teleop.automation import XcrReplayConfig, start_xcr_replay

will raise ModuleNotFoundError: No module named 'isaaclab_teleop.automation'. You need to create source/isaaclab_teleop/isaaclab_teleop/automation/__init__.py that exports these symbols.

🔴 Critical: scripts/environments/teleoperation/teleop_replay_agent.py:141-145 — Unhandled exception silently breaks loop without cleanup

except Exception as e:
    logger.error(f"Error during simulation step: {e}")
    break

This catches ALL exceptions including KeyboardInterrupt (which inherits from BaseException, so actually won't be caught) and programming errors. The break exits the loop but:

The async replay driver keeps running and will eventually call post_quit() on an already-closed env
The exception traceback is lost (only message logged, not the full trace)
No re-raise means CI will exit 0 on simulation errors

Consider: logger.exception("...") to preserve traceback, and potentially raise after cleanup or set a non-zero exit code.

🟡 Warning: scripts/environments/teleoperation/teleop_replay_agent.py:137-140 — action.repeat() may fail on 1D tensors

action = teleop_interface.advance()
...
actions = action.repeat(env.num_envs, 1)

If teleop_interface.advance() returns a 1D tensor of shape (action_dim,), then .repeat(env.num_envs, 1) produces shape (action_dim,) not (num_envs, action_dim). The correct pattern for replicating across envs is:

actions = action.unsqueeze(0).repeat(env.num_envs, 1)
# or
actions = action.expand(env.num_envs, -1)

Verify the expected shape from advance() — if it's already 2D with batch dim 1, the current code is fine.

🟡 Warning: source/isaaclab_teleop/isaaclab_teleop/automation/xcr_replay.py:105-107 — Polling internal module state is fragile

while xcr_player._xcr_playback_subscription is not None:
    logger.debug("XCR replay: waiting for playback subscription to clear")
    await asyncio.sleep(5)

Relying on the underscore-prefixed _xcr_playback_subscription internal variable is brittle — it could be renamed or removed in future Kit versions. The comment acknowledges this ("public-ish signal") but there's no timeout. If the subscription never clears (bug in xcr_player, crash, etc.), this loops forever. Add a timeout:

timeout = 3600  # 1 hour max
elapsed = 0
while xcr_player._xcr_playback_subscription is not None and elapsed < timeout:
    await asyncio.sleep(5)
    elapsed += 5
if elapsed >= timeout:
    logger.warning("XCR replay: timed out waiting for playback to complete")

🟡 Warning: scripts/environments/teleoperation/teleop_replay_agent.py:108-109 — asyncio.ensure_future without a running loop

asyncio.ensure_future(start_xcr_replay(XcrReplayConfig(...)))

This requires an asyncio event loop to already be running. Kit's main loop typically provides this, but if called before Kit's async infrastructure is fully initialized, this will raise RuntimeError: no running event loop. The script structure suggests this is called after AppLauncher setup, so it's likely fine, but adding defensive handling or a comment explaining the assumption would help maintainability.

🔵 Improvement: scripts/environments/teleoperation/teleop_replay_agent.py:134-135 — Missing render call when action is None may cause frame drops

if action is None:
    env.sim.render()
    continue

When action is None, you render but skip env.step(). This is correct for not advancing physics, but the simulation clock doesn't advance either. Over a 120-second warmup period, this could lead to thousands of render-only iterations. Consider adding a small sleep to avoid busy-spinning:

if action is None:
    env.sim.render()
    simulation_app.update()  # May already be implicit
    continue

🔵 Improvement: source/isaaclab_teleop/isaaclab_teleop/automation/xcr_replay.py:91 — Instantiating XCRReplayAPI() without storing reference

XCRReplayAPI()

This instantiates an object solely for its side effects (registering the replay backend). While this works, it's unclear to future readers. Add a comment or store in a variable with _ prefix to signal intentional discard:

_ = XCRReplayAPI()  # Side effect: registers replay backend

greptile-apps · 2026-05-05T22:27:26Z

Greptile Summary

This PR adds a permanent CI entry-point (teleop_replay_agent.py) for replaying captured teleop sessions against an Isaac Lab environment, replacing the runtime patch previously applied by the teleop-cicd pipeline to teleop_se3_agent.py. It also introduces a small isaaclab_teleop.automation subpackage backing the replay driver.

teleop_replay_agent.py: Builds the teleop env, attaches the legacy handtracking device, schedules the XCR replay coroutine via asyncio.ensure_future (retained in a strong-ref set with a done-callback to surface failures), wraps env lifecycle in try/finally, and gates env.step() behind START/STOP callbacks to avoid feeding zero-pose targets during the start-delay window.
xcr_replay.py: Drives Kit's OpenXR XCR backend; stores the XCRReplayAPI instance to prevent premature GC, uses a bounded poll loop (max_replay_duration_s) to avoid an indefinite hang if xcr_player._xcr_playback_subscription never clears, and calls post_quit() on completion or timeout.
automation/__init__.py: Adds the explicit subpackage __init__.py to make from isaaclab_teleop.automation import ... reliable.

Confidence Score: 5/5

This PR is safe to merge; the new replay agent correctly handles env cleanup, future error propagation, and zero-pose suppression during startup.

All substantive issues raised in prior review rounds have been addressed: the automation/init.py is present, the Future is retained with a done-callback, env cleanup is in try/finally, XCRReplayAPI is stored in a local, and the poll loop is bounded. No new defects were identified in this pass.

No files require special attention; xcr_replay.py relies on a private Kit attribute for completion detection, which is acknowledged in both code comments and the changelog.

Important Files Changed

Filename	Overview
scripts/environments/teleoperation/teleop_replay_agent.py	New CI entry-point for non-interactive teleop replay; correctly wraps env lifecycle in try/finally, retains Future in a strong-ref set with a done-callback, and guards the step loop behind START/STOP callbacks.
source/isaaclab_teleop/isaaclab_teleop/automation/xcr_replay.py	XCR replay driver coroutine; addresses previously flagged issues (stored XCRReplayAPI reference, bounded poll timeout), but the poll loop's initial-state assumption on `_xcr_playback_subscription` is worth watching.
source/isaaclab_teleop/isaaclab_teleop/automation/init.py	Adds the `automation` subpackage `__init__.py`, exporting `XcrReplayConfig` and `start_xcr_replay`; clean and minimal.
source/isaaclab_teleop/changelog.d/rwiltz-xcr-replay-agent.minor.rst	Changelog fragment accurately describes the new script, the start-delay zero-pose fix, the Future hang fix, and the bounded-poll addition.

Sequence Diagram

sequenceDiagram
    participant Main as teleop_replay_agent.py
    participant Env as gym.Env
    participant Device as OpenXRDevice (handtracking)
    participant Replay as start_xcr_replay (asyncio)
    participant Kit as omni.kit / XCR backend

    Main->>Env: "gym.make(task, cfg=env_cfg)"
    Main->>Device: create_teleop_device(handtracking, callbacks)
    Main->>Env: env.reset() / teleop_interface.reset()
    Main->>Replay: asyncio.ensure_future(start_xcr_replay(cfg))
    Note over Replay: Future stored in _PENDING_REPLAY_TASKS

    loop simulation_app.is_running()
        Main->>Device: teleop_interface.advance()
        alt "teleop_active == True"
            Main->>Env: env.step(actions)
        else
            Main->>Env: env.sim.render()
        end
    end

    Replay->>Kit: configure XCR settings via carb.settings
    Replay->>Kit: XCRReplayAPI() stored in _replay_api
    Note over Replay: asyncio.sleep(start_delay_s)
    Replay->>Kit: start_replay_if_enabled()
    Replay->>Kit: EnabledXRProfile(profile_name)
    Kit-->>Device: OpenXR start event
    Device-->>Main: "teleop_active = True"

    loop poll every 5s up to max_replay_duration_s
        Replay->>Kit: check xcr_player._xcr_playback_subscription
    end

    Replay->>Kit: omni.kit.app.post_quit()
    Kit-->>Main: "simulation_app.is_running() = False"
    Main->>Env: env.close() via finally
    Main->>Kit: simulation_app.close()

_{Reviews (2): Last reviewed commit: "Merge branch 'develop' into rwiltz/integ..." | Re-trigger Greptile}

rwiltz · 2026-05-08T20:44:16Z

@greptile

Initial commit of CICD integration for teleop

9d402a3

rwiltz requested a review from ooctipus as a code owner May 5, 2026 22:22

github-actions Bot added the isaac-lab Related to Isaac Lab team label May 5, 2026

isaaclab-review-bot Bot reviewed May 5, 2026

View reviewed changes

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

aristarkhovNV and others added 4 commits May 8, 2026 10:07

add missing __init__.py

1d72345

fix license check

d91366e

Wiring up the start signal

2aa7940

Addressing feedback

b485110

Fixing replay bugs

4949cca

rwiltz force-pushed the rwiltz/integrate-teleop-cicd branch from 22725cc to 4949cca Compare May 8, 2026 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add teleop replay agent for non-interactive CI runs#5507

Add teleop replay agent for non-interactive CI runs#5507
rwiltz wants to merge 6 commits intoisaac-sim:developfrom
rwiltz:rwiltz/integrate-teleop-cicd

rwiltz commented May 5, 2026

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

greptile-apps Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rwiltz commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rwiltz commented May 5, 2026

Description

Type of change

Checklist

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Architecture Impact

Implementation Verdict

Test Coverage

CI Status

Findings

Uh oh!

greptile-apps Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rwiltz commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 5, 2026 •

edited

Loading