The local-first memory control plane for AI agents.
Give Codex, Claude Code, Claude Desktop, Cursor, Windsurf, VS Code, JetBrains, Ollama-backed agents, and custom agent services one durable memory layer they can check before they act.
Agents forget the exact mistakes they made yesterday. They repeat broken commands, lose project-specific rules, miss contradictions, and treat every new session like a cold start.
Audrey turns those hard-won lessons into a local memory runtime:
memory_recallfinds durable context by semantic similarity.memory_preflightchecks prior failures, risks, rules, and relevant procedures before an action.memory_reflexesconverts remembered evidence into trigger-response guidance agents can follow.memory_validatecloses the loop after the action -helpful,used, orwrongoutcomes feed salience and decay.memory_dreamconsolidates episodes into principles and applies decay.audrey impactandaudrey doctortell a human or CI system whether the runtime is doing real work and is actually ready.
It is not a hosted vector database, a notes app, or a Claude-only plugin. Audrey is a SQLite-backed continuity layer that can sit under any local or sidecar agent loop.
Requires Node.js 20+.
npx audrey doctor
npx audrey demodoctor verifies Node, the MCP entrypoint, provider selection, memory-store health, and host config generation. demo runs a no-key, no-host, no-network proof: it creates temporary memories, records a redacted failed tool trace, generates a Memory Capsule, proves recall, prints Memory Reflexes, and deletes the demo store.
Expected first-run shape:
Audrey Doctor v0.23.0
Store health: not initialized
Verdict: ready
After the first real memory write, doctor should report the store as healthy.
Preview host setup without editing config files:
npx audrey install --host codex --dry-run
npx audrey install --host claude-code --dry-run
npx audrey install --host generic --dry-runGenerate raw config blocks:
npx audrey mcp-config codex
npx audrey mcp-config generic
npx audrey mcp-config vscodeClaude Code can be registered directly:
npx audrey install
claude mcp listAll local MCP paths default to local embeddings and one shared SQLite-backed memory directory. Use AUDREY_DATA_DIR to isolate projects, tenants, or host identities.
Installer-generated host config does not include provider API keys by default. Prefer setting ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, or GEMINI_API_KEY in the host runtime environment; use npx audrey install --include-secrets only if you explicitly accept argv/config exposure.
Ollama runs models; Audrey supplies memory. Start Audrey as a local REST sidecar and expose its routes as tools in your agent loop:
AUDREY_AGENT=ollama-local-agent npx audrey serve
curl http://localhost:7437/health
curl http://localhost:7437/v1/statusRunnable example:
AUDREY_AGENT=ollama-local-agent npx audrey serve
OLLAMA_MODEL=qwen3 node examples/ollama-memory-agent.js "What should you remember about Audrey?"Core sidecar tools:
| Agent Need | REST Route |
|---|---|
| Guard an action before tool use | POST /v1/guard/before |
| Record the outcome after tool use | POST /v1/guard/after |
| Check memory before acting | POST /v1/preflight |
| Get reflex rules for an action | POST /v1/reflexes |
| Store a useful observation | POST /v1/encode |
| Recall relevant context | POST /v1/recall |
| Get a turn-sized memory packet | POST /v1/capsule |
| Check health | GET /v1/status |
Audrey Guard is the memory-before-action loop. It asks Audrey what matters before a tool runs, returns a receipt-backed go, caution, or block decision, and records the outcome afterward so memory quality improves over time.
In v0.23.0 this becomes Audrey's headline control surface: a local agent can ask "what do I already know that should change this action?" before running a shell command, editing a file, publishing a package, or touching production. The receipt is then closed with guard-after, so Audrey learns whether the memory was useful instead of remaining a passive retrieval cache.
npx audrey guard --tool "npm test" --strict "run npm test before release"
npx audrey guard --json --tool "npm test" --strict "run npm test before release"Agents and hooks should pair guard with guard-after:
npx audrey guard-after --receipt <receipt_id> --outcome failed --error-summary "Vitest failed with spawn EPERM"| Surface | Status |
|---|---|
| MCP stdio server | 22 tools plus status/recent/principles resources and briefing/recall/reflection prompts |
| CLI | doctor, demo, guard, guard-after, install, mcp-config, status, dream, reembed, observe-tool, promote, impact |
| REST API | Hono server with /health and /v1/* routes |
| JavaScript SDK | Direct TypeScript/Node import from audrey |
| Python client | pip install audrey-memory, calls the REST sidecar |
| Storage | Local SQLite plus sqlite-vec, no hosted database required |
| Deployment | npm package, Docker, Compose, host-specific MCP config generation |
| Safety loop | guard receipts, preflight warnings, reflexes, redacted tool traces, contradiction handling |
Audrey is built around the parts of memory that matter for agents:
- Episodic memory: specific observations, tool results, preferences, and session facts.
- Semantic memory: consolidated principles extracted from repeated evidence.
- Procedural memory: remembered ways to act, avoid, retry, or verify.
- Affect and salience: emotional weight and importance influence recall.
- Interference and decay: stale, conflicting, or low-confidence memories lose authority over time.
- Contradiction handling: competing claims are tracked instead of silently overwritten.
- Tool-trace learning: failed commands and risky actions become future preflight warnings.
The product bet is simple: the next generation of useful agents will not just retrieve facts. They will remember what happened, decide whether a memory is still trustworthy, and use that memory before touching tools.
import { Audrey } from 'audrey';
const brain = new Audrey({
dataDir: './audrey-data',
agent: 'support-agent',
embedding: { provider: 'local', dimensions: 384 },
});
await brain.encode({
content: 'Stripe returns HTTP 429 above 100 req/s',
source: 'direct-observation',
tags: ['stripe', 'rate-limit'],
});
const memories = await brain.recall('stripe rate limit');
await brain.waitForIdle();
brain.close();pip install audrey-memoryfrom audrey_memory import Audrey
brain = Audrey(base_url="http://127.0.0.1:7437", agent="support-agent")
memory_id = brain.encode("Stripe returns HTTP 429 above 100 req/s", source="direct-observation")
results = brain.recall("stripe rate limit", limit=5)
brain.close()Audrey is close to a 1.0-ready local memory runtime, but production depends on how it is embedded. Treat it like stateful infrastructure.
Release gates used for this package:
npm run release:gate
npx audrey doctor
npx audrey demoRecommended runtime checks:
npx audrey doctor --json
npx audrey status --json --fail-on-unhealthy
npx audrey install --host codex --dry-runProduction controls you still own:
- Set one
AUDREY_DATA_DIRper tenant, environment, or isolation boundary. - Pin
AUDREY_EMBEDDING_PROVIDERandAUDREY_LLM_PROVIDERexplicitly. - Back up the SQLite data directory before provider or dimension changes.
- Keep API keys and raw credentials out of encoded memory content.
- Use
AUDREY_API_KEYif the REST sidecar is reachable beyond the local process boundary. - Run
npx audrey dreamon a schedule so consolidation and decay stay current. - Add application-level encryption, retention, access control, and audit logging for regulated environments.
| Variable | Default | Purpose |
|---|---|---|
AUDREY_DATA_DIR |
~/.audrey/data |
SQLite memory store path. Use one per tenant or agent identity for isolation. |
AUDREY_AGENT |
local-agent |
Logical agent identity stamped on writes. |
AUDREY_EMBEDDING_PROVIDER |
local |
local, gemini, openai, or mock. Cloud providers require explicit opt-in. |
AUDREY_LLM_PROVIDER |
auto | anthropic, openai, or mock. |
AUDREY_DEVICE |
gpu |
Local embedding device (gpu or cpu). Falls back to CPU if GPU init fails. |
AUDREY_PORT |
7437 |
REST sidecar port. |
AUDREY_HOST |
127.0.0.1 |
REST sidecar bind address. Set to 0.0.0.0 only with AUDREY_API_KEY. |
AUDREY_API_KEY |
unset | Bearer token required for non-loopback REST traffic. |
AUDREY_ALLOW_NO_AUTH |
0 |
Set to 1 to allow non-loopback bind without an API key. Don't. |
AUDREY_ENABLE_ADMIN_TOOLS |
0 |
Set to 1 to enable export, import, and forget routes/tools. Disabled by default. |
AUDREY_PROMOTE_ROOTS |
unset | Colon/semicolon-separated extra roots for audrey promote --yes writes. By default writes are restricted to process.cwd(). |
AUDREY_DEBUG |
0 |
Set to 1 to print MCP info logs (server started, warmup completed). Errors always log. |
AUDREY_PROFILE |
0 |
Set to 1 to emit per-stage timings via MCP _meta.diagnostics. |
AUDREY_DISABLE_WARMUP |
0 |
Set to 1 to skip background embedding warmup at MCP boot. |
AUDREY_ONNX_VERBOSE |
0 |
Set to 1 to restore ONNX runtime EP-assignment warnings (suppressed by default). |
AUDREY_PRAGMA_DEFAULTS |
1 |
Set to 0 to revert SQLite PRAGMA tuning to better-sqlite3 defaults. |
AUDREY_CONTEXT_BUDGET_CHARS |
4000 |
Default Memory Capsule character budget. |
Audrey ships benchmark commands for performance, memory behavior, and the guard loop.
npm run bench:perf-snapshot measures encode and hybrid recall latency at multiple corpus sizes against the in-process mock provider. It reports p50/p95/p99 plus machine provenance so the numbers are reproducible and honest about what they cover.
npm run build
npm run bench:perf-snapshot # default sizes 100, 1000, 5000
node benchmarks/perf-snapshot.js --sizes 1000,10000 --json # custom shapeSample output from benchmarks/snapshots/perf-0.23.0.json (18-core Apple M5 Max, Node 25.9.0, mock 64-dim embedding, hybrid recall, limit 5):
| Corpus size | Encode p50 (ms) | Encode p95 (ms) | Recall p50 (ms) | Recall p95 (ms) | Recall p99 (ms) |
|---|---|---|---|---|---|
| 100 | 0.14 | 0.25 | 0.21 | 0.69 | 1.3 |
| 1,000 | 0.11 | 0.19 | 0.27 | 0.48 | 2.1 |
| 5,000 | 0.11 | 0.17 | 0.73 | 0.87 | 4.2 |
These numbers cover Audrey's own pipeline (SQLite + sqlite-vec + hybrid ranking) and exclude embedding-provider cost. Real-world recall p95 with a local 384-dim provider is typically 5-15x higher; with a hosted provider it is dominated by the API round-trip. Run on your own hardware before quoting numbers anywhere.
npm run bench:memory:check is a release gate. It runs a small set of retrieval, lifecycle, and guard-loop scenarios (information extraction, knowledge updates, multi-session reasoning, conflict resolution, privacy boundary, overwrite, delete-and-abstain, semantic/procedural merge, prior tool-failure caution, strict must-follow blocking, replay rejection, and non-guard receipt rejection) and asserts Audrey doesn't regress. Retrieval and lifecycle cases compare Audrey with three weak baselines (vector-only, keyword+recency, recent-window). Guard-loop cases are reported as a controller regression suite against no-controller placeholders, not as a fair baseline leaderboard.
npm run bench:memory # full regression suite (writes JSON + report)
npm run bench:memory:check # release gate, exits non-zero on regression
npm run bench:memory:guard # closed-loop guard benchmark onlyThe release benchmark policy is tracked in docs/MEMORY_BENCHMARKING.md: local artifacts first, external claims only when the harness and scoring are reproducible.
# First contact
npx audrey doctor
npx audrey demo
# MCP setup
npx audrey install --host codex --dry-run
npx audrey mcp-config codex
npx audrey mcp-config generic
npx audrey install
npx audrey uninstall
# Health and maintenance
npx audrey status
npx audrey status --json --fail-on-unhealthy
npx audrey dream
npx audrey reembed
# Closed-loop visibility
npx audrey impact
npx audrey impact --json --window 7 --limit 5
# Tool-trace learning
npx audrey observe-tool --event PostToolUse --tool Bash --outcome failed
npx audrey promote --dry-run
# REST sidecar
npx audrey serve
docker compose up -d --buildThe Node sidecar defaults to 127.0.0.1:7437. The Docker image intentionally binds inside the container on 3487; override the published host port with AUDREY_PUBLISHED_PORT when using Compose.
- Memory benchmarking policy
- Production backlog and release posture
- Audrey paper outline
- Security policy
- Public setup, runtime, benchmark, and command guidance is maintained in this README.
npm ci
npm run release:gate
python -m unittest discover -s python/tests -v
python -m build --no-isolation pythonOn some locked-down Windows hosts, Vitest/Vite can fail before tests start with spawn EPERM. That is an environment process-spawn blocker, not an Audrey runtime failure. Use npm run release:gate:sandbox, direct dist/ smokes, and GitHub Actions as the release evidence path.
MIT. See LICENSE.


