AI memory layer powered by GrafeoDB, an embedded graph database with native vector search.
No servers, no Docker, no Neo4j, no Qdrant. One .db file + one LLM.
Typical memory stack: Containers with Neo4j + Qdrant, Embedding API + LLM
grafeo-memory stack: grafeo (single file) + LLM
uv add grafeo-memory # base (bring your own LLM + embedder)
uv add grafeo-memory[mistral] # + Mistral embeddings
uv add grafeo-memory[openai] # + OpenAI embeddings
uv add grafeo-memory[anthropic] # + Anthropic embeddings
uv add grafeo-memory[mcp] # + MCP server for AI agents
uv add grafeo-memory[all] # all providersOr with pip:
pip install grafeo-memory[openai]from openai import OpenAI
from grafeo_memory import MemoryManager, MemoryConfig, OpenAIEmbedder
embedder = OpenAIEmbedder(OpenAI())
config = MemoryConfig(db_path="./memory.db", user_id="alice")
with MemoryManager("openai:gpt-4o-mini", config, embedder=embedder) as memory:
# Add memories from conversation
events = memory.add("I just started a new job at Acme Corp as a data scientist")
# -> [ADD "alice works at acme_corp", ADD "alice is a data_scientist"]
events = memory.add("I've been promoted to senior data scientist at Acme")
# -> [UPDATE "alice is a senior data scientist at acme_corp"]
events = memory.add("I left Acme and joined Beta Inc")
# -> [DELETE "alice works at acme_corp", ADD "alice works at beta_inc"]
# Search
results = memory.search("Where does Alice work?")
# -> [SearchResult(text="alice works at beta_inc", score=0.92, ...)]from mistralai import Mistral
from grafeo_memory import MemoryManager, MemoryConfig, MistralEmbedder
embedder = MistralEmbedder(Mistral())
config = MemoryConfig(db_path="./memory.db", user_id="alice")
with MemoryManager("mistral:mistral-small-latest", config, embedder=embedder) as memory:
events = memory.add("I just started a new job at Acme Corp as a data scientist")
results = memory.search("Where does Alice work?")grafeo-memory implements the reconciliation loop, the intelligence layer that decides what to remember:
- Extract facts from conversation text (LLM call)
- Extract entities and relationships (LLM tool call)
- Search existing memory for related facts (vector + graph)
- Reconcile new facts against existing memory (LLM decides ADD/UPDATE/DELETE/NONE)
- Execute the decisions against GrafeoDB
┌──────────────────────────────────────────┐
│ grafeo-memory │
│ │
│ Extractor -> Reconciler -> Executor │
│ (LLM) (LLM) (GrafeoDB) │
└──────────────────┬───────────────────────┘
│
┌─────────┴──────────┐
│ GrafeoDB │
│ Graph + Vector │
│ + Text (optional) │
│ single .db file │
└────────────────────┘
config = MemoryConfig(db_path="./chat_memory.db")
with MemoryManager("openai:gpt-4o-mini", config, embedder=embedder) as memory:
# Each user's memories are isolated
memory.add("I love hiking in the mountains", user_id="bob")
memory.add("I prefer beach vacations", user_id="carol")
bob_results = memory.search("vacation preferences", user_id="bob")
# -> hiking, mountains
carol_results = memory.search("vacation preferences", user_id="carol")
# -> beach vacationsgrafeo-memory uses pydantic-ai model strings, so any provider pydantic-ai supports works out of the box:
# OpenAI — use OpenAIEmbedder for embeddings
MemoryManager("openai:gpt-4o-mini", config, embedder=OpenAIEmbedder(OpenAI()))
# Anthropic — pair with OpenAI or custom embedder
MemoryManager("anthropic:claude-sonnet-4-5-20250929", config, embedder=embedder)
# Groq — pair with OpenAI or custom embedder
MemoryManager("groq:llama-3.3-70b-versatile", config, embedder=embedder)
# Mistral — use MistralEmbedder for embeddings
MemoryManager("mistral:mistral-small-latest", config, embedder=MistralEmbedder(Mistral()))
# Google — pair with OpenAI or custom embedder
MemoryManager("google-gla:gemini-2.0-flash", config, embedder=embedder)| Class | Provider | Default Model | Install Extra |
|---|---|---|---|
OpenAIEmbedder |
OpenAI | text-embedding-3-small |
[openai] |
MistralEmbedder |
Mistral | mistral-embed |
[mistral] |
Both accept an optional model parameter to override the default.
Implement the EmbeddingClient protocol to use any embedding provider:
from grafeo_memory import EmbeddingClient
class MyEmbedder:
def embed(self, texts: list[str]) -> list[list[float]]:
# Call your embedding API
return [...]
@property
def dimensions(self) -> int:
return 1024 # your model's output dimensions
memory = MemoryManager("openai:gpt-4o-mini", config, embedder=MyEmbedder())grafeo-memory includes a built-in MCP server so AI agents (Claude Desktop, Cursor, etc.) can use it as a tool.
uv add grafeo-memory[mcp]
# or: pip install grafeo-memory[mcp]Add to claude_desktop_config.json:
{
"mcpServers": {
"grafeo-memory": {
"command": "grafeo-memory-mcp",
"env": {
"GRAFEO_MEMORY_MODEL": "openai:gpt-4o-mini",
"GRAFEO_MEMORY_DB": "./memory.db"
}
}
}
}| Tool | Description |
|---|---|
memory_add |
Add a memory by extracting facts from text |
memory_add_batch |
Add multiple memories in one batch |
memory_search |
Search memories by semantic similarity and graph context |
memory_update |
Update an existing memory's text |
memory_delete |
Delete a single memory |
memory_delete_all |
Delete all memories for a user |
memory_list |
List all stored memories |
memory_summarize |
Consolidate old memories into topic-grouped summaries |
memory_history |
Show change history for a memory |
| Variable | Default | Description |
|---|---|---|
GRAFEO_MEMORY_MODEL |
openai:gpt-4o-mini |
pydantic-ai model string |
GRAFEO_MEMORY_DB |
(in-memory) | Database file path |
GRAFEO_MEMORY_USER |
default |
Default user ID |
GRAFEO_MEMORY_YOLO |
(off) | Set to 1 for all features |
Supports stdio (default), SSE and streamable HTTP:
grafeo-memory-mcp # stdio (default)
grafeo-memory-mcp sse # SSE
grafeo-memory-mcp streamable-httpNote: This is different from grafeo-mcp, which exposes the raw GrafeoDB database. grafeo-memory-mcp wraps the high-level memory API (extract, reconcile, search, summarize).
grafeo-memory supports OpenTelemetry instrumentation via pydantic-ai. When enabled, all LLM calls (extraction, reconciliation, summarization, reranking) are traced automatically.
config = MemoryConfig(instrument=True) # uses global OTel providerFor custom providers:
from grafeo_memory import InstrumentationSettings
config = MemoryConfig(instrument=InstrumentationSettings(
tracer_provider=my_tracer_provider,
include_content=False,
))| Traditional stack | grafeo-memory | |
|---|---|---|
| Infrastructure | Neo4j + Qdrant (Docker) | Single .db file |
| Install size | ~750MB (Docker images) | ~16MB (uv add) |
| Offline/edge | Requires servers | Yes |
| Graph + vector | Separate services | Unified engine |
| LLM providers | Varies | pydantic-ai (OpenAI, Anthropic, Mistral, Groq, Google) |
| Embeddings | External API required | Protocol-based (any provider) |
MemoryManager(model, config=None, *, embedder): create memory manager.modelis a pydantic-ai model string (e.g."openai:gpt-4o-mini").add(messages, user_id=None, session_id=None, metadata=None, *, infer=True, importance=1.0, memory_type="semantic")→AddResult(list ofMemoryEvent).search(query, user_id=None, k=10, *, filters=None, rerank=True, memory_type=None)→SearchResponse(list ofSearchResult).update(memory_id, text)→MemoryEvent: update a memory's text directly.get_all(user_id=None, memory_type=None)→list[SearchResult].delete(memory_id)→bool.delete_all(user_id=None)→int(count deleted).summarize(user_id=None, *, preserve_recent=5, batch_size=20)→AddResult.history(memory_id)→list[HistoryEntry].set_importance(memory_id, importance)→bool.close(): close the database
Use as a context manager: with MemoryManager(...) as memory:. Multiple sessions in the same process are supported.
db_path: path to database file (None for in-memory)user_id: default user scope (default"default")session_id: default session scopeagent_id: default agent scopesimilarity_threshold: max embedding distance for reconciliation (default 0.7)embedding_dimensions: vector dimensions (default 1536)enable_importance: enable composite scoring with recency/frequency/importance (default False)weight_topology: topology score weight for graph-connected memories (default 0.0, requiresenable_importance)enable_topology_boost: re-rank search results by graph connectivity, no LLM call (default False)topology_boost_factor: strength of topology boost (default 0.2)consolidation_protect_threshold: protect well-connected memories from summarize (default 0.0, off)instrument: OpenTelemetry instrumentation,TrueorInstrumentationSettings(default False)
.embed(texts: list[str]) -> list[list[float]]: generate embeddings for a batch of texts.dimensions -> int: return the embedding vector dimensionality
AddResult: list subclass ofMemoryEvent, with.usagefor LLM token countsSearchResponse: list subclass ofSearchResult, with.usagefor LLM token countsMemoryEvent:.action(ADD/UPDATE/DELETE/NONE),.memory_id,.text,.old_textSearchResult:.memory_id,.text,.score,.user_id,.metadata,.relations,.memory_typeHistoryEntry:.event,.old_text,.new_text,.timestamp,.actor_id,.role
# AddResult is iterable:
for event in memory.add("text"):
print(event.action, event.text)
# SearchResponse is iterable:
for result in memory.search("query"):
print(result.text, result.score)grafeo-memory is part of the GrafeoDB ecosystem:
- grafeo: Core graph database engine (Rust)
- grafeo-langchain: LangChain integration
- grafeo-llamaindex: LlamaIndex integration
- grafeo-mcp: MCP server for raw GrafeoDB access
- grafeo-memory-mcp (built-in): MCP server for the memory API (
uv add grafeo-memory[mcp]orpip install grafeo-memory[mcp])
All packages share the same .db file. Build memories with grafeo-memory, query them with grafeo-langchain, expose them via MCP.
- Python 3.12+
Apache-2.0