Skip to content

CLI and MCP Testing - 4-19-26 #80

@michaelschecht

Description

@michaelschecht

aX Platform — MCP & CLI Analysis (2026-04-19)

Team-facing summary of hands-on testing across the ax CLI and the ax-platform MCP server.
Source logs (verbose): AX-CLI/AX_CLI_TEST_RESULTS.md, AX-MCP/AX_MCP_TEST_RESULTS.md.

Please note: all tests performed on a new private workspace named edge-radar


At a Glance

Surface Tests Run Pass Bug / Issue Skipped (destructive)
CLI (ax) 58 32 19 7
MCP (ax-platform) ~30 actions across 7 tools 18 10 8
Total bugs/doc-gaps ~31 CLI + 10 MCP

Environment: Windows 11, CLI config at .ax/config.toml, MCP auth bound to agent Edge-Radar-Scriptor, target space f6e56126-c293-... (Edge Radar).


Key Architectural Difference: CLI vs MCP

Dimension CLI MCP
Identity User PAT (michaelschecht) — resolved_agent is informational only Agent-bound via GitHub OAuth (Edge-Radar-Scriptor)
Default scope Workspace context resolved from config unscoped_read_only until space_id is passed (and even then, session-level writes stay blocked)
Response shape Table or --json; rich-rendered Structured envelope: {kind, version, state, data, actions, active_tab}
Memory / HITL None whoami.remember/recall, create_draft → approve_draft flows

Implication: CLI and MCP are not interchangeable. The CLI hits more endpoints successfully because it runs as a user PAT with implicit scope; the MCP session is agent-bound and many session-level writes are blocked regardless of space_id.


What Works (Happy Paths)

CLI

Area Working Commands
Auth / identity auth whoami, auth token show, auth exchange (JWT, 900s TTL)
Messaging messages list, send (incl. --to, --reply-to, --file *.md, --skip-ax), get, edit, search, top-level send alias
Agents agents list, get, update --description, update --status, avatar --output (local SVG)
Tasks tasks list, create, get, update --priority, update --status in_progress
Context context set/get/list/delete, upload-file, fetch-url
Spaces spaces list, get
Profile profile list (empty), add (write), remove
Keys / creds keys list, credentials list
Streaming listen --dry-run (SSE connects cleanly)
Workflow verbs assign run, and aliases ship run / manage run / boss run (all identical under the hood)

MCP

Tool Working Actions
whoami get, list, recall (memory surfaced read-only)
agents list (incl. view_scope=mine), set_control (preview + confirmed=true commit)
messages check (incl. awareness block, not in CLI), send, edit, draft, react (add-only)
tasks list, get, create, update --description (text fields only)
context list, get (returns file_content + file_upload.url in one call)
spaces list, get, members, discover / list_public, invite_details (error shape clean)
search query (relevance-scored, paginated)

Bugs & Issues

Tier 1 — Critical / Blocking

# Surface Issue Impact
1 CLI profile add writes token_file / workdir_path as TOML basic strings with raw \ — unparseable by tomllib on Windows. profile list / verify / env / use all crash with TOMLDecodeError immediately after add. 4 of 6 profile subcommands broken on Windows
2 MCP Silent no-op writes: tasks.update status=… and agents.update bio=… return success (notice:"Task updated.", updated_at bumps) but the field never changes. CLI status update works fine — MCP handler drops the field. Automation cannot trust MCP writes without re-read
3 CLI Silent-drop flag family — success reported, value never reaches server/predicate: agents update --bio, --specialization; tasks create --assign-to; watch --poll --contains. Four flags look functional but aren't
4 CLI cp1252 crash family on Windows: rich dies on emoji / unicode arrow in JSON, table, and help output. Hits messages list --json, agents list --json, messages list table, upload file --help. Any rich output with non-cp1252 chars crashes; data still writes to stdout but CLI exits non-zero
5 CLI messages send --act-as raises AttributeError: 'NoneType' object has no attribute 'get' at messages.py:130. Fix: me.get("credential_scope") or {}. Unhandled crash instead of a clean "needs agent-scoped token" message

Tier 2 — Server-Side / Inconsistent Validation

# Surface Issue
6 CLI agents status/api/v1/agents/presence returns HTML (SPA fallthrough); same for agents tools/api/v1/organizations/{space_id}/roster
7 CLI agents avatar --set → 405 Method Not Allowed
8 CLI tasks update --status <garbage> accepted as-is — no enum validation. Conversely, --priority <garbage> returns 500 (should be 400)
9 CLI tasks update --status done sets status but leaves completed_at: null — downstream filters on completed_at IS NOT NULL miss every CLI-completed task
10 CLI messages delete <id> → 500 Server error on every tested message
11 CLI watch --mention (SSE) → 401, while listen connects with the same token — SSE auth divergence
12 CLI events stream hangs indefinitely with no output / timeout (likely same SSE 401 swallowed)
13 CLI messages send --file → 415 for files with unguessable MIME types (.env.example, etc.)
14 CLI context downloadUnsupportedProtocol because upload-file stores a relative URL and download hands it to httpx unchanged

Tier 3 — Schema / Shape Drift

# Surface Issue
15 MCP messages tool JSON schema declares space_id, but server rejects it as unknown keyword (send, draft, react)
16 MCP agents.update advertises a changes:{…} dict but requires flat fields; changes is silently ignored
17 MCP agents.toggle is not a real toggle — requires explicit state argument
18 MCP tasks.update status="Done" returns 422 (display label ≠ API enum completed) — but "completed" then silently no-ops (see Tier 1 #2)
19 CLI tasks get --json wraps in {"task": {...}}; tasks create --json returns flat — inconsistent across siblings
20 CLI tasks get default output is Python dict repr — not a table, not JSON

Tier 4 — Session / Scope Behavior (MCP)

Session-level writes gated on an "active workspace context" the agent-bound OAuth credential never acquires. All return *_blocked notices:

Tool Blocked Action
whoami update, remember, follow / unfollow
context set, delete

Passing space_id on the call does not lift permissions.mode from unscoped_read_only. A separate workspace-activation step is missing from the MCP flow.

Tier 5 — Doc / Reference Gaps

  • ax messages has 6 subcommands; reference lists 2. Undocumented: get, edit, delete, search.
  • ax context has 7 subcommands; reference lists 2 and names one wrong (upload vs actual upload-file).
  • ax spaces has 4 subcommands; reference lists 1. Missing: create, get, members.
  • ax profile remove missing from reference.
  • Profile path in doc (~/.ax/config.toml) doesn't match actual (~/.ax/profiles/<name>/profile.toml).
  • Workflow-verb priority defaults in reference (assign=Medium / boss=Critical / etc.) don't match CLI help (all high). Source confirms ship/manage/boss are literal aliases of assign — no per-verb behavior difference.
  • MCP context set / fetch-url silently trigger auto-summarization via us.amazon.nova-micro-v1:0 — adds ~1.5s latency + per-call LLM cost; not mentioned anywhere.

Environmental

  • Every CLI command warns .ax/config.toml has permissions 0o666 — should be 0600. chmod isn't native on Windows; CLI should use icacls or skip the check when os.name == 'nt'. Token cache is deleted on each run, forcing extra JWT exchanges.
  • MCP context.list payloads routinely exceed tool-result token ceiling — clients must probe structure via get on a specific key.

Cross-Cutting Patterns

  1. Silent writes are the biggest trust problem on both surfaces. The CLI has 4 silent-drop flags; MCP has 2 silent no-op writes. Neither returns a 4xx or a diagnostic notice. Automation must re-read after every write.
  2. Windows CP1252 kills several rich-rendered CLI paths. A single fix (force PYTHONIOENCODING=utf-8 at entrypoint or wrap sys.stdout in a UTF-8 writer) would eliminate 3 of the 4 crash sites. The TOML quoting issue (Agent ID header support + identity model docs #1) is separate.
  3. SSE auth is inconsistent across watch, events stream, and listen — same token works for one and 401s on the other.
  4. Reference doc is systematically behind the source. Audit needed across messages, context, spaces, profile, and workflow verbs.
  5. Bio/specialization mystery: CLI reports null for all agents; MCP shows values on Edge-Radar-Scriptor only. Hypothesis: a separate profile layer was populated via another surface (web UI or prior MCP session with session write scope). Worth reconciling.

Recommended Priorities

Priority Fix
P0 Profile TOML quoting on Windows (bug #1) — blocks the profile workflow entirely
P0 Audit silent-write paths: CLI flag-to-payload mapping and MCP tasks.update.status / agents.update.bio handlers
P0 Force UTF-8 stdout in CLI entrypoint — kills the cp1252 family
P1 messages send --act-as crash fix (trivial)
P1 Server-side: implement or 405-with-reason the /presence, /roster, /agents/<uuid> (avatar), /messages/<id> DELETE endpoints
P1 MCP: resolve session-scope activation so whoami / context writes aren't unconditionally blocked
P1 Unify SSE auth across watch / events stream / listen
P2 Add validation for tasks.status enum + populate completed_at on done
P2 Bring reference doc up to date (messages, context, spaces, profile, workflow verbs)
P2 Document MCP auto-summarization side effect + cost

Left-over Test State (needs cleanup)

  • Task 7bc603c7-… stuck open (MCP cannot flip status) — close via CLI or web UI.
  • Reaction 🧪 on message e41eab48-… (MCP reactions are add-only).
  • Both CLI test tasks are done but not deletable (no tasks delete subcommand exists).

Scope / Commands NOT Tested

Skipped per destructive-ops policy: auth init, auth token set, keys create/revoke/rotate, credentials issue-*, spaces create, agents create/delete, profile use, listen --exec, ax channel (MCP stdio server). On the MCP side: all create_draft → approve_draft flows, spaces.create/join_*, agents.disable/enable/set_placement, messages.delete.

These should be exercised in a throwaway environment before any broad rollout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions