CLI and MCP Testing - 4-19-26

# aX Platform — MCP & CLI Analysis (2026-04-19)

Team-facing summary of hands-on testing across the `ax` CLI and the `ax-platform` MCP server.
Source logs (verbose): [`AX-CLI/AX_CLI_TEST_RESULTS.md`](../AX-CLI/AX_CLI_TEST_RESULTS.md), [`AX-MCP/AX_MCP_TEST_RESULTS.md`](../AX-MCP/AX_MCP_TEST_RESULTS.md).

**Please note: all tests performed on a new private workspace named edge-radar**

---

## At a Glance

| Surface | Tests Run | Pass | Bug / Issue | Skipped (destructive) |
|---|---:|---:|---:|---:|
| CLI (`ax`) | 58 | 32 | 19 | 7 |
| MCP (`ax-platform`) | ~30 actions across 7 tools | 18 | 10 | 8 |
| **Total bugs/doc-gaps** | — | — | **~31 CLI + 10 MCP** | — |

**Environment:** Windows 11, CLI config at `.ax/config.toml`, MCP auth bound to agent `Edge-Radar-Scriptor`, target space `f6e56126-c293-...` (Edge Radar).

---

## Key Architectural Difference: CLI vs MCP

| Dimension | CLI | MCP |
|---|---|---|
| Identity | User PAT (`michaelschecht`) — `resolved_agent` is informational only | Agent-bound via GitHub OAuth (`Edge-Radar-Scriptor`) |
| Default scope | Workspace context resolved from config | `unscoped_read_only` until `space_id` is passed (and even then, session-level writes stay blocked) |
| Response shape | Table or `--json`; `rich`-rendered | Structured envelope: `{kind, version, state, data, actions, active_tab}` |
| Memory / HITL | None | `whoami.remember/recall`, `create_draft → approve_draft` flows |

Implication: CLI and MCP are not interchangeable. The CLI hits more endpoints successfully because it runs as a user PAT with implicit scope; the MCP session is agent-bound and many session-level writes are blocked regardless of `space_id`.

---

## What Works (Happy Paths)

### CLI

| Area | Working Commands |
|---|---|
| Auth / identity | `auth whoami`, `auth token show`, `auth exchange` (JWT, 900s TTL) |
| Messaging | `messages list`, `send` (incl. `--to`, `--reply-to`, `--file *.md`, `--skip-ax`), `get`, `edit`, `search`, top-level `send` alias |
| Agents | `agents list`, `get`, `update --description`, `update --status`, `avatar --output` (local SVG) |
| Tasks | `tasks list`, `create`, `get`, `update --priority`, `update --status in_progress` |
| Context | `context set/get/list/delete`, `upload-file`, `fetch-url` |
| Spaces | `spaces list`, `get` |
| Profile | `profile list` (empty), `add` (write), `remove` |
| Keys / creds | `keys list`, `credentials list` |
| Streaming | `listen --dry-run` (SSE connects cleanly) |
| Workflow verbs | `assign run`, and aliases `ship run` / `manage run` / `boss run` (all identical under the hood) |

### MCP

| Tool | Working Actions |
|---|---|
| `whoami` | `get`, `list`, `recall` (memory surfaced read-only) |
| `agents` | `list` (incl. `view_scope=mine`), `set_control` (preview + `confirmed=true` commit) |
| `messages` | `check` (incl. `awareness` block, not in CLI), `send`, `edit`, `draft`, `react` (add-only) |
| `tasks` | `list`, `get`, `create`, `update --description` (text fields only) |
| `context` | `list`, `get` (returns `file_content` + `file_upload.url` in one call) |
| `spaces` | `list`, `get`, `members`, `discover` / `list_public`, `invite_details` (error shape clean) |
| `search` | `query` (relevance-scored, paginated) |

---

## Bugs & Issues

### Tier 1 — Critical / Blocking

| # | Surface | Issue | Impact |
|---:|---|---|---|
| 1 | CLI | `profile add` writes `token_file` / `workdir_path` as TOML basic strings with raw `\` — unparseable by `tomllib` on Windows. `profile list / verify / env / use` all crash with `TOMLDecodeError` immediately after `add`. | 4 of 6 `profile` subcommands broken on Windows |
| 2 | MCP | **Silent no-op writes:** `tasks.update status=…` and `agents.update bio=…` return success (`notice:"Task updated."`, `updated_at` bumps) but the field never changes. CLI status update works fine — MCP handler drops the field. | Automation cannot trust MCP writes without re-read |
| 3 | CLI | **Silent-drop flag family** — success reported, value never reaches server/predicate: `agents update --bio`, `--specialization`; `tasks create --assign-to`; `watch --poll --contains`. | Four flags look functional but aren't |
| 4 | CLI | cp1252 crash family on Windows: `rich` dies on emoji / unicode arrow in JSON, table, and help output. Hits `messages list --json`, `agents list --json`, `messages list` table, `upload file --help`. | Any `rich` output with non-cp1252 chars crashes; data still writes to stdout but CLI exits non-zero |
| 5 | CLI | `messages send --act-as` raises `AttributeError: 'NoneType' object has no attribute 'get'` at `messages.py:130`. Fix: `me.get("credential_scope") or {}`. | Unhandled crash instead of a clean "needs agent-scoped token" message |

### Tier 2 — Server-Side / Inconsistent Validation

| # | Surface | Issue |
|---:|---|---|
| 6 | CLI | `agents status` → `/api/v1/agents/presence` returns HTML (SPA fallthrough); same for `agents tools` → `/api/v1/organizations/{space_id}/roster` |
| 7 | CLI | `agents avatar --set` → 405 Method Not Allowed |
| 8 | CLI | `tasks update --status <garbage>` accepted as-is — no enum validation. Conversely, `--priority <garbage>` returns 500 (should be 400) |
| 9 | CLI | `tasks update --status done` sets status but leaves `completed_at: null` — downstream filters on `completed_at IS NOT NULL` miss every CLI-completed task |
| 10 | CLI | `messages delete <id>` → 500 Server error on every tested message |
| 11 | CLI | `watch --mention` (SSE) → 401, while `listen` connects with the same token — SSE auth divergence |
| 12 | CLI | `events stream` hangs indefinitely with no output / timeout (likely same SSE 401 swallowed) |
| 13 | CLI | `messages send --file` → 415 for files with unguessable MIME types (`.env.example`, etc.) |
| 14 | CLI | `context download` → `UnsupportedProtocol` because `upload-file` stores a **relative** URL and `download` hands it to httpx unchanged |

### Tier 3 — Schema / Shape Drift

| # | Surface | Issue |
|---:|---|---|
| 15 | MCP | `messages` tool JSON schema declares `space_id`, but server rejects it as unknown keyword (`send`, `draft`, `react`) |
| 16 | MCP | `agents.update` advertises a `changes:{…}` dict but requires flat fields; `changes` is silently ignored |
| 17 | MCP | `agents.toggle` is not a real toggle — requires explicit `state` argument |
| 18 | MCP | `tasks.update status="Done"` returns 422 (display label ≠ API enum `completed`) — but `"completed"` then silently no-ops (see Tier 1 #2) |
| 19 | CLI | `tasks get --json` wraps in `{"task": {...}}`; `tasks create --json` returns flat — inconsistent across siblings |
| 20 | CLI | `tasks get` default output is Python `dict` repr — not a table, not JSON |

### Tier 4 — Session / Scope Behavior (MCP)

Session-level writes gated on an "active workspace context" the agent-bound OAuth credential never acquires. All return `*_blocked` notices:

| Tool | Blocked Action |
|---|---|
| `whoami` | `update`, `remember`, `follow` / `unfollow` |
| `context` | `set`, `delete` |

Passing `space_id` on the call does **not** lift `permissions.mode` from `unscoped_read_only`. A separate workspace-activation step is missing from the MCP flow.

### Tier 5 — Doc / Reference Gaps

- `ax messages` has 6 subcommands; reference lists 2. Undocumented: `get`, `edit`, `delete`, `search`.
- `ax context` has 7 subcommands; reference lists 2 and names one wrong (`upload` vs actual `upload-file`).
- `ax spaces` has 4 subcommands; reference lists 1. Missing: `create`, `get`, `members`.
- `ax profile remove` missing from reference.
- Profile path in doc (`~/.ax/config.toml`) doesn't match actual (`~/.ax/profiles/<name>/profile.toml`).
- Workflow-verb priority defaults in reference (assign=Medium / boss=Critical / etc.) don't match CLI help (all `high`). Source confirms `ship`/`manage`/`boss` are literal aliases of `assign` — no per-verb behavior difference.
- MCP `context set` / `fetch-url` silently trigger auto-summarization via `us.amazon.nova-micro-v1:0` — adds ~1.5s latency + per-call LLM cost; not mentioned anywhere.

### Environmental

- Every CLI command warns `.ax/config.toml has permissions 0o666 — should be 0600`. `chmod` isn't native on Windows; CLI should use `icacls` or skip the check when `os.name == 'nt'`. Token cache is deleted on each run, forcing extra JWT exchanges.
- MCP `context.list` payloads routinely exceed tool-result token ceiling — clients must probe structure via `get` on a specific key.

---

## Cross-Cutting Patterns

1. **Silent writes are the biggest trust problem** on both surfaces. The CLI has 4 silent-drop flags; MCP has 2 silent no-op writes. Neither returns a 4xx or a diagnostic `notice`. Automation must re-read after every write.
2. **Windows CP1252** kills several `rich`-rendered CLI paths. A single fix (force `PYTHONIOENCODING=utf-8` at entrypoint or wrap `sys.stdout` in a UTF-8 writer) would eliminate 3 of the 4 crash sites. The TOML quoting issue (#1) is separate.
3. **SSE auth** is inconsistent across `watch`, `events stream`, and `listen` — same token works for one and 401s on the other.
4. **Reference doc is systematically behind the source.** Audit needed across `messages`, `context`, `spaces`, `profile`, and workflow verbs.
5. **Bio/specialization mystery:** CLI reports null for all agents; MCP shows values on `Edge-Radar-Scriptor` only. Hypothesis: a separate profile layer was populated via another surface (web UI or prior MCP session with session write scope). Worth reconciling.

---

## Recommended Priorities

| Priority | Fix |
|---|---|
| **P0** | Profile TOML quoting on Windows (bug #1) — blocks the profile workflow entirely |
| **P0** | Audit silent-write paths: CLI flag-to-payload mapping and MCP `tasks.update.status` / `agents.update.bio` handlers |
| **P0** | Force UTF-8 stdout in CLI entrypoint — kills the cp1252 family |
| **P1** | `messages send --act-as` crash fix (trivial) |
| **P1** | Server-side: implement or 405-with-reason the `/presence`, `/roster`, `/agents/<uuid>` (avatar), `/messages/<id>` DELETE endpoints |
| **P1** | MCP: resolve session-scope activation so `whoami` / `context` writes aren't unconditionally blocked |
| **P1** | Unify SSE auth across `watch` / `events stream` / `listen` |
| **P2** | Add validation for `tasks.status` enum + populate `completed_at` on done |
| **P2** | Bring reference doc up to date (messages, context, spaces, profile, workflow verbs) |
| **P2** | Document MCP auto-summarization side effect + cost |

---

## Left-over Test State (needs cleanup)

- Task `7bc603c7-…` stuck `open` (MCP cannot flip status) — close via CLI or web UI.
- Reaction `🧪` on message `e41eab48-…` (MCP reactions are add-only).
- Both CLI test tasks are `done` but not deletable (no `tasks delete` subcommand exists).

---

## Scope / Commands NOT Tested

Skipped per destructive-ops policy: `auth init`, `auth token set`, `keys create/revoke/rotate`, `credentials issue-*`, `spaces create`, `agents create/delete`, `profile use`, `listen --exec`, `ax channel` (MCP stdio server). On the MCP side: all `create_draft → approve_draft` flows, `spaces.create/join_*`, `agents.disable/enable/set_placement`, `messages.delete`.

These should be exercised in a throwaway environment before any broad rollout.


Tool	Working Actions
`whoami`	`get`, `list`, `recall` (memory surfaced read-only)
`agents`	`list` (incl. `view_scope=mine`), `set_control` (preview + `confirmed=true` commit)
`messages`	`check` (incl. `awareness` block, not in CLI), `send`, `edit`, `draft`, `react` (add-only)
`tasks`	`list`, `get`, `create`, `update --description` (text fields only)
`context`	`list`, `get` (returns `file_content` + `file_upload.url` in one call)
`spaces`	`list`, `get`, `members`, `discover` / `list_public`, `invite_details` (error shape clean)
`search`	`query` (relevance-scored, paginated)

Tool	Blocked Action
`whoami`	`update`, `remember`, `follow` / `unfollow`
`context`	`set`, `delete`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI and MCP Testing - 4-19-26 #80

aX Platform — MCP & CLI Analysis (2026-04-19)

At a Glance

Key Architectural Difference: CLI vs MCP

What Works (Happy Paths)

CLI

MCP

Bugs & Issues

Tier 1 — Critical / Blocking

Tier 2 — Server-Side / Inconsistent Validation

Tier 3 — Schema / Shape Drift

Tier 4 — Session / Scope Behavior (MCP)

Tier 5 — Doc / Reference Gaps

Environmental

Cross-Cutting Patterns

Recommended Priorities

Left-over Test State (needs cleanup)

Scope / Commands NOT Tested

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Surface	Tests Run	Pass	Bug / Issue	Skipped (destructive)
CLI (`ax`)	58	32	19	7
MCP (`ax-platform`)	~30 actions across 7 tools	18	10	8
Total bugs/doc-gaps	—	—	~31 CLI + 10 MCP	—

Dimension	CLI	MCP
Identity	User PAT (`michaelschecht`) — `resolved_agent` is informational only	Agent-bound via GitHub OAuth (`Edge-Radar-Scriptor`)
Default scope	Workspace context resolved from config	`unscoped_read_only` until `space_id` is passed (and even then, session-level writes stay blocked)
Response shape	Table or `--json`; `rich`-rendered	Structured envelope: `{kind, version, state, data, actions, active_tab}`
Memory / HITL	None	`whoami.remember/recall`, `create_draft → approve_draft` flows

Area	Working Commands
Auth / identity	`auth whoami`, `auth token show`, `auth exchange` (JWT, 900s TTL)
Messaging	`messages list`, `send` (incl. `--to`, `--reply-to`, `--file *.md`, `--skip-ax`), `get`, `edit`, `search`, top-level `send` alias
Agents	`agents list`, `get`, `update --description`, `update --status`, `avatar --output` (local SVG)
Tasks	`tasks list`, `create`, `get`, `update --priority`, `update --status in_progress`
Context	`context set/get/list/delete`, `upload-file`, `fetch-url`
Spaces	`spaces list`, `get`
Profile	`profile list` (empty), `add` (write), `remove`
Keys / creds	`keys list`, `credentials list`
Streaming	`listen --dry-run` (SSE connects cleanly)
Workflow verbs	`assign run`, and aliases `ship run` / `manage run` / `boss run` (all identical under the hood)

#	Surface	Issue	Impact
1	CLI	`profile add` writes `token_file` / `workdir_path` as TOML basic strings with raw `\` — unparseable by `tomllib` on Windows. `profile list / verify / env / use` all crash with `TOMLDecodeError` immediately after `add`.	4 of 6 `profile` subcommands broken on Windows
2	MCP	Silent no-op writes: `tasks.update status=…` and `agents.update bio=…` return success (`notice:"Task updated."`, `updated_at` bumps) but the field never changes. CLI status update works fine — MCP handler drops the field.	Automation cannot trust MCP writes without re-read
3	CLI	Silent-drop flag family — success reported, value never reaches server/predicate: `agents update --bio`, `--specialization`; `tasks create --assign-to`; `watch --poll --contains`.	Four flags look functional but aren't
4	CLI	cp1252 crash family on Windows: `rich` dies on emoji / unicode arrow in JSON, table, and help output. Hits `messages list --json`, `agents list --json`, `messages list` table, `upload file --help`.	Any `rich` output with non-cp1252 chars crashes; data still writes to stdout but CLI exits non-zero
5	CLI	`messages send --act-as` raises `AttributeError: 'NoneType' object has no attribute 'get'` at `messages.py:130`. Fix: `me.get("credential_scope") or {}`.	Unhandled crash instead of a clean "needs agent-scoped token" message

#	Surface	Issue
6	CLI	`agents status` → `/api/v1/agents/presence` returns HTML (SPA fallthrough); same for `agents tools` → `/api/v1/organizations/{space_id}/roster`
7	CLI	`agents avatar --set` → 405 Method Not Allowed
8	CLI	`tasks update --status <garbage>` accepted as-is — no enum validation. Conversely, `--priority <garbage>` returns 500 (should be 400)
9	CLI	`tasks update --status done` sets status but leaves `completed_at: null` — downstream filters on `completed_at IS NOT NULL` miss every CLI-completed task
10	CLI	`messages delete <id>` → 500 Server error on every tested message
11	CLI	`watch --mention` (SSE) → 401, while `listen` connects with the same token — SSE auth divergence
12	CLI	`events stream` hangs indefinitely with no output / timeout (likely same SSE 401 swallowed)
13	CLI	`messages send --file` → 415 for files with unguessable MIME types (`.env.example`, etc.)
14	CLI	`context download` → `UnsupportedProtocol` because `upload-file` stores a relative URL and `download` hands it to httpx unchanged

#	Surface	Issue
15	MCP	`messages` tool JSON schema declares `space_id`, but server rejects it as unknown keyword (`send`, `draft`, `react`)
16	MCP	`agents.update` advertises a `changes:{…}` dict but requires flat fields; `changes` is silently ignored
17	MCP	`agents.toggle` is not a real toggle — requires explicit `state` argument
18	MCP	`tasks.update status="Done"` returns 422 (display label ≠ API enum `completed`) — but `"completed"` then silently no-ops (see Tier 1 #2)
19	CLI	`tasks get --json` wraps in `{"task": {...}}`; `tasks create --json` returns flat — inconsistent across siblings
20	CLI	`tasks get` default output is Python `dict` repr — not a table, not JSON

Priority	Fix
P0	Profile TOML quoting on Windows (bug #1) — blocks the profile workflow entirely
P0	Audit silent-write paths: CLI flag-to-payload mapping and MCP `tasks.update.status` / `agents.update.bio` handlers
P0	Force UTF-8 stdout in CLI entrypoint — kills the cp1252 family
P1	`messages send --act-as` crash fix (trivial)
P1	Server-side: implement or 405-with-reason the `/presence`, `/roster`, `/agents/<uuid>` (avatar), `/messages/<id>` DELETE endpoints
P1	MCP: resolve session-scope activation so `whoami` / `context` writes aren't unconditionally blocked
P1	Unify SSE auth across `watch` / `events stream` / `listen`
P2	Add validation for `tasks.status` enum + populate `completed_at` on done
P2	Bring reference doc up to date (messages, context, spaces, profile, workflow verbs)
P2	Document MCP auto-summarization side effect + cost

CLI and MCP Testing - 4-19-26 #80

Description

aX Platform — MCP & CLI Analysis (2026-04-19)

At a Glance

Key Architectural Difference: CLI vs MCP

What Works (Happy Paths)

CLI

MCP

Bugs & Issues

Tier 1 — Critical / Blocking

Tier 2 — Server-Side / Inconsistent Validation

Tier 3 — Schema / Shape Drift

Tier 4 — Session / Scope Behavior (MCP)

Tier 5 — Doc / Reference Gaps

Environmental

Cross-Cutting Patterns

Recommended Priorities

Left-over Test State (needs cleanup)

Scope / Commands NOT Tested

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions