Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions .github/REVIEW_GUIDELINES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Review Guidelines — agentkeys

This is the single source of truth for code review patterns in this repo. The
`claude-code-review.yml` workflow points Claude at this file; human reviewers
should also use it as a checklist.

Background: these patterns were distilled from 15+ PR review cycles in
March-April 2026 where codex repeatedly surfaced the same classes of bug. Each
numbered item below has been a real P1 or P2 finding at least once.

## Test constraints

- **ALWAYS use `cargo test -p <crate> -- --test-threads=1`.** Tests mutate
shared process env (HOME, keyring accounts, AGENTKEYS_SESSION_STORE) and
parallel runs race. This is not negotiable.
- Target only affected crates. `cargo test` on the whole workspace takes 5+
minutes and masks which crate actually regressed.
- `cargo clippy -p <crate> -- -D warnings` should stay clean.
- When adding FFI code (macOS LAContext, keychain), add both L2 FFI-boundary
tests behind `#[cfg(target_os = "macos")]` AND a `MockBackend` so the
non-FFI behavior stays testable on Linux CI.

## Canonical bug patterns

### 1. Cross-wallet credential leak on namespace collisions

**Don't:** use raw user input as a filesystem directory or keyring account
name.

**Do:** sanitize via `sanitize_for_keyring(session_id)` in `session_store.rs`
— ASCII alnum + `-_.` preserved, anything else replaced with `_`, rewrites
suffixed with a SHA-256-derived 8-char hash under the reserved `__agk_`
prefix. `DefaultHasher` is NOT stable across Rust versions; use `sha2`
crate.

Reference: PR #24 v6 P1 + v9, issue #37.

### 2. Nondeterministic daemon session selection

**Don't:** rely on `std::fs::read_dir` order when picking among multiple
`daemon-*` sessions.

**Do:** sort alphabetically via `list_fallback_session_ids` before selecting.
If more than one loadable session exists and no `--session-id` is passed,
error with the candidate list rather than picking arbitrarily.

Reference: PR #24 v1 P1.

### 3. URL encoding via reqwest `.query()`, never raw interpolation

**Don't:**
```rust
let url = format!("/identity/resolve?identity_type=alias&identity_value={}", raw);
```

**Do:**
```rust
let resp = client.get(format!("{backend_url}/identity/resolve"))
.query(&[("identity_type", "alias"), ("identity_value", raw)])
.send()
.await?;
```

Aliases with `+`, `&`, `%`, spaces, plus-addressed emails (`bot+prod@…`)
break under raw interpolation.

Reference: PR #20, PR #22 v1 P2, PR #29.

### 4. Session-token redaction in prompt / log strings

**Don't:** echo `cmd_revoke`'s raw argument into error messages, biometric
prompts, or stderr. The argument may be a live bearer token.

**Do:** pipe user-supplied strings through `redact_prompt_reason()` (in
`biometric/logic.rs`) — strips anything over 40 characters opaque while
preserving short `0x…` wallet addresses.

Reference: PR #27 P2 (session-token leak via biometric prompt), PR #38.

### 5. Wallet comparison must be case-insensitive OR normalize before compare

**Don't:** `session.wallet.0 == target_wallet_str` when `target_wallet_str`
came from user input. EIP-55 checksummed addresses collide with the
backend's lowercase storage.

**Do:** either `eq_ignore_ascii_case` (revoke self-detection pattern in
`cmd_revoke`) OR lowercase before sending the value to `/identity/resolve`
(pattern in PR #22 v3 `resolve_parent_if_set`).

Reference: PR #18 P2, PR #22 v2 P2.

### 6. Session TTL is 30 days uniformly

Master, agent, sandbox — all sessions are 30 days per `wiki/session-token.md`.
Don't introduce per-type TTL splits; they were tried and reverted.

Reference: PR #23.

### 7. Keychain operations must be synchronous

**Don't:** spawn a detached thread to delete a keyring entry. Callers use
the return value to decide what to tell the user, and the CLI often exits
before the background thread runs.

**Do:** use the `try_keyring_save`/`try_keyring_load` pattern — spawn a
thread, wait up to 2 seconds via `recv_timeout`, return a bool. Propagate
failures as `anyhow::Error` from `clear_session`.

Reference: PR #24 v8 P1 (fire-and-forget delete regression).

### 8. Path traversal guards on user-supplied session_id / filename

**Don't:** join user input directly into a filesystem path.

**Do:** sanitize with the pattern from `crates/agentkeys-core/src/session_store.rs::sanitize_for_keyring`.
Reject reserved path components (`""`, `"."`, `".."`) and anything starting
with the reserved `__agk_` rewrite prefix — those are forced through the
hash-suffix path so user inputs can't collide with rewrite outputs.

Reference: PR #24 v2 P2, v6 P1.

### 9. Audit log: DENIED rows for every cross-agent probing path

New `/credential/*` endpoints that gate on `is_owner_of` must insert a
`'denied'` audit row BEFORE returning 403. This keeps cross-agent probing
visible.

Check: `read_credential`, `store_credential`, `list_credentials`,
`teardown_agent` all insert DENIED rows. New endpoints must match.

Reference: PR #19 P2.

### 10. Mock server design principles (from CLAUDE.md)

- **Typed parameters**: every endpoint takes explicit typed inputs (e.g.,
`identity_type` + `identity_value`). No opaque JSON-blob parsing at
runtime. Blockchain extrinsics require typed inputs; the mock mirrors
that.
- **Shared `resolve_identity`**: one utility in `handlers/identity.rs` —
never inline `if let Alias => ... else if Email => ...` chains per
handler.
- **Modular handlers**: request-type-specific logic goes in separate
functions (`mint_pair_session`, `mint_recover_session`). `approve_auth_request`
is pure dispatch.

Reference: CLAUDE.md → "Mock Server Design Principles".

## Architectural facts to apply

- **Master → agent is strictly one hop.** No grandchildren. `is_owner_of`
checks a single `parent_token` level and that's sufficient for the
current model. Reviewers should NOT file tickets about transitive
ownership unless the session model has grown recursive pairing, which
it has not. See closed issue #35.
- **No users in production yet.** Pre-launch state. "Preserve pre-#N
legacy data on upgrade" findings get dismissed because there is no
legacy field state. When we ship, we'll add dedicated migration commits
with CHANGELOG entries.
- **Commit message style**: `fix(<crate>): #<issue> — <subject>` or
`docs(<scope>): <subject>`. Version suffixes `v2`, `v3` etc for
iterative codex-review fixes on the same PR.

## Scope control

- `bug fix` PRs should NOT include refactors beyond the fix. If the diff
starts growing new abstractions, flag it as scope creep.
- Don't add feature flags or backwards-compatibility shims unless the
project has users (see above — it does not).
- Don't write tests that only re-verify framework guarantees (e.g.,
"tokio spawn returns a JoinHandle"). Test user-visible behavior.

## When codex and Claude disagree

If this repo's automated review (claude-code-action) and a manual codex
review flag contradictory issues on the same line, prefer the pattern in
this doc. If the doc is silent, surface the disagreement in the PR thread
rather than picking a side silently — the author makes the call.
89 changes: 89 additions & 0 deletions .github/workflows/claude-code-review.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
name: Claude Code Review

on:
pull_request:
types: [opened, synchronize, ready_for_review, reopened]
# Only run on code / test / doc paths we actually want reviewed.
# Skips cheap churn like Cargo.lock-only updates.
paths:
- "crates/**"
- "docs/**"
- "wiki/**"
- ".github/workflows/**"
- "Cargo.toml"
- "CLAUDE.md"
- "harness/**"

jobs:
claude-review:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
issues: read
id-token: write
actions: read

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # full history so Claude can inspect git log / blame during review

# Rust toolchain + cache so `cargo check` / `cargo test -p <crate>` runs fast.
# Every reviewer-invoked cargo command benefits.
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
with:
components: clippy, rustfmt

- name: Cache cargo registry + target
uses: Swatinem/rust-cache@v2
with:
shared-key: "agentkeys-review"

- name: Run Claude Code Review
id: claude-review
uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
plugin_marketplaces: 'https://github.com/anthropics/claude-code.git'
plugins: 'code-review@claude-code-plugins'
# Custom review prompt tailored to agentkeys' Rust workspace + known
# review patterns from past codex iterations. See
# .github/REVIEW_GUIDELINES.md for the full pattern catalog.
prompt: |
/code-review:code-review ${{ github.repository }}/pull/${{ github.event.pull_request.number }}

REPO CONTEXT:
- Rust workspace. Workspace-member crates: agentkeys-types, agentkeys-core,
agentkeys-cli, agentkeys-daemon, agentkeys-mock-server, agentkeys-mcp,
agentkeys-provisioner.
- READ `CLAUDE.md` at repo root for architecture, mock-server design
principles, and test commands.
- READ `.github/REVIEW_GUIDELINES.md` for agentkeys-specific review
patterns (audit-log contract, session-token redaction, URL encoding
via reqwest `.query()`, `--test-threads=1` requirement, etc).
- Related specs: `docs/spec/architecture.md`,
`docs/spec/credential-backend-interface.md`,
`wiki/session-token.md` (30-day TTL policy).

TEST CONSTRAINTS:
- Tests mutate shared process state (HOME, keyring accounts) so
ALWAYS use `cargo test -p <crate> -- --test-threads=1`.
- Target only affected crates, not the whole workspace.
- `cargo clippy -p <crate> -- -D warnings` is expected clean.

KNOWN REVIEW PATTERNS (apply these before accepting a PR):
1. Cross-wallet credential safety (namespacing, sanitization).
2. Audit log DENIED rows for all cross-agent probing paths.
3. URL encoding via reqwest `.query()` — NEVER raw `format!()`
interpolation into query strings.
4. Token / session-token redaction in prompts and log lines.
5. Case-insensitive wallet comparison (EIP-55 vs backend lowercase).
6. Session TTL uniformly 30 days per `wiki/session-token.md`.
7. Synchronous keychain ops — no fire-and-forget delete.
8. Path traversal guards on any user-supplied session_id / filename.

claude_args: |
--allowed-tools Bash(cargo check:*),Bash(cargo test:*),Bash(cargo clippy:*),Bash(cargo fmt:*),Bash(git diff:*),Bash(git log:*),Bash(git blame:*),Bash(gh pr:*),Bash(gh issue view:*)
59 changes: 59 additions & 0 deletions .github/workflows/claude.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Claude Code

on:
issue_comment:
types: [created]
pull_request_review_comment:
types: [created]
issues:
types: [opened, assigned]
pull_request_review:
types: [submitted]

jobs:
claude:
if: |
(github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
(github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
(github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: read
issues: read
id-token: write
actions: read # required for Claude to read CI results on PRs

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0 # full history so `git log` / `git blame` work in-session

# Rust toolchain + cache so `@claude test` / `@claude check clippy`
# style requests don't need to pay cold-compile cost.
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
with:
components: clippy, rustfmt

- name: Cache cargo registry + target
uses: Swatinem/rust-cache@v2
with:
shared-key: "agentkeys-review"

- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@v1
with:
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
additional_permissions: |
actions: read

# Repo-specific allow-list so @claude requests can actually run
# tests / lint / diff without hitting permission prompts.
# `--test-threads=1` is required because our test suites mutate
# shared process env (HOME, keyring accounts).
claude_args: |
--allowed-tools Bash(cargo check:*),Bash(cargo test:*),Bash(cargo clippy:*),Bash(cargo fmt:*),Bash(git diff:*),Bash(git log:*),Bash(git blame:*),Bash(gh pr:*),Bash(gh issue view:*),Bash(gh pr comment:*),Bash(gh pr edit:*)
Loading