fix(cli): #37 — real macOS LAContext biometric gate via objc2 FFI by hanwencheng · Pull Request #38 · litentry/agentKeys

hanwencheng · 2026-04-15T02:20:54Z

Summary

Wires real Touch ID / Face ID on macOS via LAContext.evaluatePolicy through the objc2 + block2 FFI stack. Supersedes PR #27, which landed a stub that logged a prompt and returned Ok(()) on macOS.

Closes #37.

Design — trait seam

biometric/
├── mod.rs         BiometricBackend trait + require_biometric() entry point
├── error.rs       typed BiometricError enum (all LAError codes mapped)
├── logic.rs       pure functions: parse_la_error, redact_prompt_reason
├── lacontext.rs   macOS backend — real LAContext.evaluatePolicy FFI
└── stdin.rs       non-macOS backend — stdin y/N fallback

AGENTKEYS_BIOMETRIC=on activates the gate; default is bypass (preserves existing CLI behavior for scripts / CI).
AGENTKEYS_ALLOW_NO_BIOMETRIC=1 (non-macOS only) skips the stdin TTY guard.
cmd_approve, cmd_revoke, cmd_teardown call require_biometric with reason strings scrubbed via redact_prompt_reason — long opaque tokens get replaced with <redacted> so revoke <session-token> can't leak the token to terminal scrollback (the critical P2 codex flagged on PR fix(cli): #11 biometric gate for high-security master CLI actions (macOS) #27).

unsafe inventory

~4 blocks, each 2-5 lines, all annotated with // SAFETY: comments:

LAContext::new() — all objc2 message-sends are unsafe
Synchronous canEvaluatePolicy_error capability check
*mut NSError dereference inside the completion block (null-checked defensively even though Apple's contract guarantees non-null on failure)
evaluatePolicy_localizedReason_reply async method send

Deadlock / leak protections:

60-second recv_timeout on the channel — CLI can never hang forever
No Retained<LAContext> captured inside the completion block (avoids retain cycle)
Single-shot: each authenticate call builds its own LAContext

Tests — 4 layers

L1 — Pure logic (22 tests, runs on every platform)

parse_la_error — one test per documented LAError code (-1 through -10)
redact_prompt_reason — 0x-address preservation, long-token stripping, idempotence
biometric_is_enabled — env-var parsing (on / 1 / true / unset / off)

L2 — FFI boundary (2 tests, `#[cfg(target_os = "macos")]`)

la_context_constructs_and_drops — catches dylib load / linker issues
can_evaluate_policy_is_synchronous_on_ci_runner — smoke-test for FFI wiring

L3 — Behavioral via MockBackend

mock_backend_receives_redacted_reason — critical regression test for PR fix(cli): #11 biometric gate for high-security master CLI actions (macOS) #27's P2 token-leak finding
mock_backend_returns_scripted_error — scripted-response validation

L4 — Manual QA (`docs/manual-test-issue-37.md`)

7 scenarios requiring real macOS Touch ID hardware — cannot be automated:

Touch ID succeeds → action proceeds
User cancels → action aborts with clean message
Biometry lockout → passcode fallback
Device without biometry sensor
AGENTKEYS_BIOMETRIC=off escape hatch
Token-leak regression: revoke <raw-token> must NOT echo the token to stderr / scrollback
Non-macOS stdin fallback

Dependencies

Gated with [target.'cfg(target_os = "macos")'.dependencies] — zero cost on Linux / Windows builds:

objc2 = "0.6"
objc2-foundation = "0.3"
objc2-local-authentication = "0.3"
block2 = "0.6"

Test plan

cargo test -p agentkeys-cli biometric -- --test-threads=1 → 22 passed (L1 + L3 + L2 on macOS)
cargo test -p agentkeys-cli -- --test-threads=1 → 25 existing CLI integration tests still green with gate default-off
cargo check -p agentkeys-cli → clean compile on macOS with full FFI stack linked
Manual QA scenarios 1-7 from docs/manual-test-issue-37.md (requires reviewer with Touch ID hardware)

Issue

Closes #37. Supersedes #27 — PR #27 should be closed when this merges (the trait-seam design replaces its biometric.rs entirely).

Supersedes PR #27's stub. Wires actual Touch ID / Face ID enforcement on macOS through `LAContext.evaluatePolicy`, with a trait-seam design so non-macOS platforms, tests, and the env-opt-out path are all first-class. ## Design ``` biometric/ ├── mod.rs trait BiometricBackend + require_biometric() entry point ├── error.rs typed BiometricError enum (all LAError codes mapped) ├── logic.rs pure functions: parse_la_error, redact_prompt_reason ├── lacontext.rs macOS backend — real LAContext.evaluatePolicy FFI └── stdin.rs non-macOS backend — stdin y/N fallback ``` - `AGENTKEYS_BIOMETRIC=on` activates the gate; default is bypass (preserves existing CLI behavior for scripts / CI). - `AGENTKEYS_ALLOW_NO_BIOMETRIC=1` (non-macOS) skips the stdin TTY guard. - `cmd_approve`, `cmd_revoke`, `cmd_teardown` call `require_biometric` with reason strings that are scrubbed via `redact_prompt_reason` — long opaque tokens get replaced with `<redacted>` so `revoke <session-token>` can't leak the token to terminal scrollback (codex PR #27 P2). ## unsafe inventory (~4 blocks, all with SAFETY comments) 1. `LAContext::new()` — all objc2 message-sends are unsafe 2. `canEvaluatePolicy_error` synchronous call 3. `*mut NSError` dereference inside the completion block (null-checked defensively even though Apple's contract guarantees non-null on failure) 4. `evaluatePolicy_localizedReason_reply` async method send Deadlock / leak protections: - 60-second `recv_timeout` on the channel (CLI can never hang forever) - No `Retained<LAContext>` captured inside the completion block (avoids retain cycle) - Single-shot: each `authenticate` call builds its own LAContext ## Tests — 4 layers **L1 — Pure logic (22 tests, runs everywhere):** - `parse_la_error` one test per documented NSError code - `redact_prompt_reason` preservation of 0x addresses, stripping of long tokens, idempotence - `biometric_is_enabled` env-var parsing **L2 — FFI boundary (2 tests, macOS only):** - `la_context_constructs_and_drops` — catches dylib load / linker issues - `can_evaluate_policy_is_synchronous_on_ci_runner` — smoke-test that the FFI wiring doesn't crash when called in a test context **L3 — Behavioral via MockBackend:** - `mock_backend_receives_redacted_reason` — critical regression for the P2 token-leak finding; asserts raw tokens never reach the backend - `mock_backend_returns_scripted_error` — scripted-response validation **L4 — Manual QA (`docs/manual-test-issue-37.md`):** 7 scenarios including Touch ID success/cancel, lockout → passcode fallback, device without biometry, env opt-out, token-leak regression, non-macOS stdin fallback. ## Dependencies macOS only (gated `[target.'cfg(target_os = \"macos\")'.dependencies]`): - `objc2 = \"0.6\"` - `objc2-foundation = \"0.3\"` - `objc2-local-authentication = \"0.3\"` - `block2 = \"0.6\"` Zero cost on Linux/Windows — objc2 + block2 don't enter the build graph. Test: cargo test -p agentkeys-cli -- --test-threads=1 biometric: 22 passed; cli: 25 passed Closes #37. Supersedes PR #27 (which should be closed on merge). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hanwencheng · 2026-04-15T02:39:10Z

this also close #11

@claude

Augments the auto-generated Claude Code + Claude Code Review workflows with context from 15+ PR review cycles in this repo so the Action produces findings consistent with recent codex iterations instead of generic Rust advice. ## claude-code-review.yml - Scope `on.pull_request.paths` to `crates/**`, `docs/**`, `wiki/**`, workflows, Cargo.toml, CLAUDE.md, and harness/. Skips cheap Cargo.lock churn. - `fetch-depth: 0` so Claude can inspect `git log` / `git blame` during review (useful for "this finding predates the PR" arguments). - `dtolnay/rust-toolchain@stable` + `Swatinem/rust-cache@v2` so every `cargo check` / `cargo test -p <crate>` in-session runs fast. - Custom prompt injects: - crate names (agentkeys-types, agentkeys-core, etc) - pointer to CLAUDE.md for architecture + mock-server design principles - pointer to the new .github/REVIEW_GUIDELINES.md for agentkeys-specific review patterns - `--test-threads=1` requirement (tests mutate shared HOME/keyring) - the 8-pattern checklist (audit-log DENIED rows, URL-encoding via reqwest .query(), session-token redaction, case-insensitive wallet comparison, 30-day TTL, synchronous keychain ops, path-traversal guards, cross-wallet credential safety) - `claude_args --allowed-tools` whitelist for cargo/git/gh so the Action can actually run the cargo commands the prompt tells it to. ## claude.yml (@claude mentions) - Same Rust toolchain + cache setup so `@claude run tests` / `@claude check clippy` requests don't pay cold-compile cost. - `fetch-depth: 0` for git-history tools. - Same `claude_args --allowed-tools` whitelist plus `gh pr comment:*` / `gh pr edit:*` so @claude can update PR bodies and comment back with findings. ## .github/REVIEW_GUIDELINES.md (new) Single source of truth for agentkeys review patterns, extracted from PRs #18-#38 (fix/issue-10 through fix/issue-37). Documents: - Test constraints (`--test-threads=1`, per-crate targeting) - 10 canonical bug patterns that codex has flagged repeatedly - Architectural invariants (master→agent single-hop; no users yet) - Scope-control guidance (no speculative refactors, no backwards-compat shims pre-launch) - Policy for codex-vs-claude disagreements Each pattern has a PR/issue reference so reviewers (and future Claude runs) can trace why the rule exists.

@claude

* "Claude PR Assistant workflow" * "Claude Code Review workflow" * ci: wire agentkeys-specific context into Claude Code workflows Augments the auto-generated Claude Code + Claude Code Review workflows with context from 15+ PR review cycles in this repo so the Action produces findings consistent with recent codex iterations instead of generic Rust advice. ## claude-code-review.yml - Scope `on.pull_request.paths` to `crates/**`, `docs/**`, `wiki/**`, workflows, Cargo.toml, CLAUDE.md, and harness/. Skips cheap Cargo.lock churn. - `fetch-depth: 0` so Claude can inspect `git log` / `git blame` during review (useful for "this finding predates the PR" arguments). - `dtolnay/rust-toolchain@stable` + `Swatinem/rust-cache@v2` so every `cargo check` / `cargo test -p <crate>` in-session runs fast. - Custom prompt injects: - crate names (agentkeys-types, agentkeys-core, etc) - pointer to CLAUDE.md for architecture + mock-server design principles - pointer to the new .github/REVIEW_GUIDELINES.md for agentkeys-specific review patterns - `--test-threads=1` requirement (tests mutate shared HOME/keyring) - the 8-pattern checklist (audit-log DENIED rows, URL-encoding via reqwest .query(), session-token redaction, case-insensitive wallet comparison, 30-day TTL, synchronous keychain ops, path-traversal guards, cross-wallet credential safety) - `claude_args --allowed-tools` whitelist for cargo/git/gh so the Action can actually run the cargo commands the prompt tells it to. ## claude.yml (@claude mentions) - Same Rust toolchain + cache setup so `@claude run tests` / `@claude check clippy` requests don't pay cold-compile cost. - `fetch-depth: 0` for git-history tools. - Same `claude_args --allowed-tools` whitelist plus `gh pr comment:*` / `gh pr edit:*` so @claude can update PR bodies and comment back with findings. ## .github/REVIEW_GUIDELINES.md (new) Single source of truth for agentkeys review patterns, extracted from PRs #18-#38 (fix/issue-10 through fix/issue-37). Documents: - Test constraints (`--test-threads=1`, per-crate targeting) - 10 canonical bug patterns that codex has flagged repeatedly - Architectural invariants (master→agent single-hop; no users yet) - Scope-control guidance (no speculative refactors, no backwards-compat shims pre-launch) - Policy for codex-vs-claude disagreements Each pattern has a PR/issue reference so reviewers (and future Claude runs) can trace why the rule exists.

hanwencheng mentioned this pull request Apr 15, 2026

fix(cli): #11 biometric gate for high-security master CLI actions (macOS) #27

Closed

4 tasks

hanwencheng mentioned this pull request Apr 15, 2026

Add Claude Code GitHub Workflow #39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): #37 — real macOS LAContext biometric gate via objc2 FFI#38

fix(cli): #37 — real macOS LAContext biometric gate via objc2 FFI#38
hanwencheng wants to merge 1 commit intomainfrom
fix/issue-37

hanwencheng commented Apr 15, 2026

Uh oh!

hanwencheng commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanwencheng commented Apr 15, 2026

Summary

Design — trait seam

unsafe inventory

Tests — 4 layers

L1 — Pure logic (22 tests, runs on every platform)

L2 — FFI boundary (2 tests, #[cfg(target_os = "macos")])

L3 — Behavioral via MockBackend

L4 — Manual QA (docs/manual-test-issue-37.md)

Dependencies

Test plan

Issue

Uh oh!

hanwencheng commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

L2 — FFI boundary (2 tests, `#[cfg(target_os = "macos")]`)

L4 — Manual QA (`docs/manual-test-issue-37.md`)