Govern AI agent actions. Prove what happened.
ExoArmur turns governed AI agent actions into replay-verifiable proof bundles. It is currently ready for technical evaluation, not production deployment.
ExoArmur sits between your AI decision layer and execution targets. It ensures every action is:
- Policy-gated — evaluated before it runs
- Auditable — cryptographically traceable to original intent
- Replayable — deterministic reconstruction of execution traces
- Approvable — can be queued for human operator review
🚀 Try the Live Demo — One-click browser environment, no installation needed.
docs/TECHNICAL_EVALUATION.md — Complete guide for evaluating core proof claims
Proof flow:
python demos/canonical_truth_reconstruction_demo.py
exoarmur verify-bundle --bundle demos/canonical_proof_bundle.jsonStatus boundaries:
- No external audit yet
- No production certification yet
- Not BFT consensus
Status (April 2026): Ready for technical evaluation. Single-maintainer reference implementation. CI invariant gates enforce determinism, module boundaries, and three-run stability. Seeking first pilot integration. See
PROJECT_STATUS.mdfor full detail.
For authorized evaluators, clone and run locally:
git clone https://github.com/slucerodev/ExoArmur-Core.git
cd ExoArmur-Core
pip install -e .
python examples/quickstart_replay.pyExpected output: Replay result: success
For a guided setup with verification, run:
git clone https://github.com/slucerodev/ExoArmur-Core.git
cd ExoArmur-Core
./scripts/quickstart.shSee docs/QUICKSTART.md for detailed instructions.
The primary public API is the deterministic replay engine:
from exoarmur import ReplayEngine
from exoarmur.replay.event_envelope import CanonicalEvent
import hashlib, json
# Construct a canonical event with cryptographic payload hash
payload = {"kind": "inline", "ref": {"event_id": "01ARZ3NDEKTSV4RRFFQ69G5FAV"}}
event = CanonicalEvent(
event_id="01ARZ3NDEKTSV4RRFFQ69G5FAV",
event_type="belief_creation_started",
actor="demo",
correlation_id="corr-1",
payload=payload,
payload_hash=hashlib.sha256(
json.dumps(payload, sort_keys=True, separators=(",", ":")).encode()
).hexdigest(),
)
# Replay deterministically from the audit trail
engine = ReplayEngine(audit_store={"corr-1": [event]})
report = engine.replay_correlation("corr-1")
print("Replay result:", getattr(report.result, "value", report.result))
print("Failures:", report.failures or "none")Note: The V2 execution boundary (ProxyPipeline, ActionIntent) is an internal implementation detail. For agent framework integrations, see the examples in the examples/ directory.
The canonical demo exercises the complete execution boundary: policy evaluation, denial before side effects, audit trail emission, and cryptographic replay verification.
python demos/canonical_truth_reconstruction_demo.pyExpected output (deterministic, identical across runs):
Proof bundle written: .../demos/canonical_proof_bundle.json
Proof bundle replay hash: 7eb0f264dd6d6e67925ece66ec2218ac73716ae6bc8a770ef84a8defd28bf47b
DEMO_RESULT=DENIED
ACTION_EXECUTED=false
AUDIT_STREAM_ID=canonical-truth-reconstruction-demo
REPLAY_VERDICT=PASS
This demo runs in CI on every push — see .github/workflows/v2-demo-smoke.yml.
Or try the replay engine inline:
from exoarmur import ReplayEngine
from exoarmur.replay.event_envelope import CanonicalEvent
import hashlib, json
payload = {"kind": "inline", "ref": {"event_id": "01ARZ3NDEKTSV4RRFFQ69G5FAV"}}
event = CanonicalEvent(
event_id="01ARZ3NDEKTSV4RRFFQ69G5FAV",
event_type="belief_creation_started",
actor="demo",
correlation_id="corr-1",
payload=payload,
payload_hash=hashlib.sha256(
json.dumps(payload, sort_keys=True, separators=(",", ":")).encode()
).hexdigest(),
)
engine = ReplayEngine(audit_store={"corr-1": [event]})
report = engine.replay_correlation("corr-1")
print("Replay result:", getattr(report.result, "value", report.result))
print("Failures:", report.failures or "none")Run the full suite (1166 tests, three-run stability gate) with the same dependency set CI uses:
git clone https://github.com/slucerodev/ExoArmur-Core.git
cd ExoArmur-Core
pip install -r requirements.lock # exact CI-pinned runtime deps
pip install --no-deps -e ".[dev]" # editable install + dev extras
python -m pytest -qThe two-step install is deliberate: requirements.lock pins every runtime
dependency (including fastapi==0.127.1 and pydantic==2.12.5) to the
exact versions the committed OpenAPI snapshot was generated against, and
--no-deps prevents pip from silently upgrading them when applying the
dev extras. This is the same sequence every CI workflow uses — see
.github/workflows/core-invariant-gates.yml.
ExoArmur sits between your AI decision layer and execution targets. It enforces that every action:
- Passes a policy decision point before it runs
- Produces a cryptographic audit trail tied to the original intent
- Is deterministically replayable — same inputs always reconstruct the same trace
- Can be vetoed or queued for operator approval
Decision Source → ActionIntent → PolicyDecisionPoint → SafetyGate → [Approval?] → Executor → ExecutionProofBundle
- Not an LLM or agent framework
- Not a general workflow engine
- Not a distributed systems platform
ExoArmur is a governance and accountability layer that wraps whatever agent framework you already use.
Honest disclosure of what this system does and does not provide:
What it provides:
- Deterministic execution boundary enforcement via
ProxyPipeline - Cryptographic audit trails with SHA-256 payload hashes and Ed25519 signatures
- Three-run deterministic replay verification with CI-enforced stability
- Cross-node replay determinism verification with adversarial input corruption testing
- A/B replay diffing (counterfactual engine) — compares original vs modified replay outputs
- Plane isolation enforced at import-time by dependency edge guards
- Idempotent audit ingestion with deterministic content-addressed keys
What it does not provide (known limitations):
- Not Byzantine Fault Tolerance (BFT) — the fault injection system verifies replay determinism under corrupted inputs, not consensus under malicious nodes
- Not causal inference — the counterfactual engine performs deterministic replay diffing, not statistical causal analysis
- Trust evaluator is a stub — currently returns a fixed trust score of
0.85regardless of input; seesrc/exoarmur/safety/trust_evaluator.pyfor TODO list - Only mock executor ships —
MockActionExecutoris the only bundled executor; real integrations must be written asExecutorPluginimplementations - Single-node in-memory stores — default
IntentStoreandFederateIdentityStoreare in-memory; durable Postgres/JetStream backends require manual configuration - Federation is feature-flag gated — multi-cell coordination requires
EXOARMUR_FLAG_V2_FEDERATION_ENABLED=true
| Layer | Path | Purpose |
|---|---|---|
| Core engine | src/exoarmur/ |
Deterministic replay, audit, policy enforcement |
| V2 governance | src/exoarmur/execution_boundary_v2/ |
ProxyPipeline, approval workflow, executor boundary |
| Contracts | spec/contracts/ |
Immutable V1 data shapes |
| Examples | examples/ |
Quickstart and demo scripts |
Key invariants:
- ProxyPipeline is the sole execution boundary — all actions route through it
- Executors are sandboxed, untrusted plugins
- Determinism is enforced by CI — three-run stability gate on every push
- V1 contracts are immutable — new capabilities are additive and feature-flag gated
V2 capabilities default to off:
| Flag | Purpose |
|---|---|
EXOARMUR_FLAG_V2_FEDERATION_ENABLED |
Multi-cell coordination |
EXOARMUR_FLAG_V2_CONTROL_PLANE_ENABLED |
Governance control plane |
EXOARMUR_FLAG_V2_OPERATOR_APPROVAL_REQUIRED |
Human approval gate |
EXOARMUR_FLAG_GOVERNANCE_ARTIFACTS_ENABLED |
Optional governance artifact provider discovery |
EXOARMUR_FLAG_GOVERNANCE_ARTIFACT_ENFORCEMENT_ENABLED |
Fail-closed enforcement of required governance artifacts |
Core supports optional discovery of governance artifact providers from ExoArmur-GovernanceModules via Python entry points (exoarmur.governance_artifacts). This integration:
- Does not require GovernanceModules — Core runs successfully without it installed
- Uses lazy discovery — Providers are discovered only when explicitly requested
- No direct imports — Core never imports GovernanceModules code directly
- Feature-flag gated — Disabled by default (
EXOARMUR_FLAG_GOVERNANCE_ARTIFACTS_ENABLED=false)
Current implementation includes:
- Provider discovery via entry points
- Manifest validation for governance artifact metadata
- Deterministic canonicalization and hashing
- verify-bundle diagnostics — Governance manifest inspection in bundle verification output
verify-bundle Diagnostics:
When an ExecutionProofBundle contains a governance artifact manifest in governance_evidence["governance_artifacts_manifest"], verify-bundle will:
- Validate manifest shape deterministically
- Report manifest status in structured output (JSON via
--jsonflag) - Report provider availability for artifact types
- Not alter Core replay verdicts — diagnostics are read-only
- Not require providers to be installed — missing providers reported as unavailable
Semantic Verification (Optional):
When GovernanceModules providers are installed and artifact content is embedded in governance_evidence["governance_artifacts"], verify-bundle will optionally:
- Load matching providers through entry points
- Verify artifact hash consistency using provider.canonical_hash()
- Run provider.verify_artifact(artifact) for semantic validation
- Report deterministic semantic verification results
- Keep overall Core verification verdict unchanged — semantic verification is diagnostic-only
- Continue to work when providers are absent — missing providers reported as unavailable
Fail-Closed Enforcement (Optional):
When EXOARMUR_FLAG_GOVERNANCE_ARTIFACT_ENFORCEMENT_ENABLED=true, verify-bundle will enforce required governance artifacts:
- Only enforce artifacts marked with
required_for_verdict=truein the manifest - Fail closed (return FAIL verdict) if required artifacts fail verification
- Required artifacts must have: embedded content available, matching hash, available provider, and pass semantic verification
- Optional artifacts do not affect the bundle verdict even when invalid
- Default behavior (enforcement disabled) keeps existing verdict unchanged
- Enforcement result is reported in
governance_enforcementfield of verify-bundle output
Example JSON output with governance diagnostics and semantic verification:
{
"verify_verdict": "PASS",
"governance_artifacts": {
"present": true,
"valid_manifest": true,
"manifest_hash": "...",
"artifact_count": 2,
"required_count": 1,
"optional_count": 1,
"artifact_types": ["policy_snapshot", "tool_invocation_proof"],
"provider_availability": {
"policy_snapshot": "available",
"tool_invocation_proof": "unavailable"
},
"semantic_verification": {
"attempted": true,
"verdict_effect": "diagnostic_only",
"total": 2,
"verified": 1,
"valid": 1,
"invalid": 0,
"unavailable": 1,
"results": [
{
"artifact_type": "policy_snapshot",
"schema_version": "policy_snapshot.v1",
"artifact_hash": "...",
"provider_available": true,
"provider_version": "0.1.0",
"content_available": true,
"hash_matches": true,
"semantic_valid": true,
"code": "SEMANTIC_VALID",
"message": "Artifact verified successfully"
}
]
},
"verdict_effect": "diagnostic_only"
}
}Status: Provider discovery, manifest validation, verify-bundle diagnostics, and optional semantic verification are implemented. Semantic verification is diagnostic-only and does not alter Core replay verdicts. See ExoArmur-GovernanceModules documentation for provider interface details.
Validate deterministic execution under high concurrency:
exoarmur benchmark --determinism-load --runs 100 --concurrency 500Expected output:
DETERMINISM
-----------
Concurrency: 500
Executions: 100
Unique Hashes: 1
STATUS: PASS
This proves that hash consistency holds even under extreme concurrent load. See docs/BENCHMARKS.md for full benchmark suite documentation.
Every push runs:
- Core Invariant Gates — three deterministic test runs, boundary enforcement, repo cleanliness
- Multi-Platform Tests — Python 3.12 on Linux, macOS, Windows (minimum supported: 3.10)
- Security Scan — CodeQL + pip-audit
- V2 Demo Smoke Test — full governance pipeline end-to-end
Current: 1166 passing, 10 skipped, 3 xfailed. Skipped tests require optional external components (live NATS demo, filesystem/HTTP executor plugins, PoD provider, external waiver file). No external infrastructure required for the core suite.
docker compose up -d
EXOARMUR_LIVE_DEMO=1 python -m pytest tests/test_golden_demo_live.py -v- Security & Threat Model — Security policy, threat model, and vulnerability reporting
- Architecture — Full system architecture
- The ExoArmur Doctrine — Verified claims and compliance
- Doctrine Verification — Test evidence and status
- ML Isolation Policy — Advisory-only ML components
- Governance — Reversibility guarantees and approval gates
- Design Principles
- Validation Guide
- Phase Status
- Whitepaper
ExoArmur-Core is open-source under the Apache License 2.0.
Optional commercial/proprietary modules may be distributed separately and are not part of this Core repository unless explicitly included.
See the LICENSE file for details.
Contributions are accepted only by written agreement. Submitted changes require explicit IP/licensing agreement before acceptance.