6-stage hybrid retrieval
FTS5 + BM25 + dense + graph + CrossEncoder + MMR with RRF fusion. Tuned on real Claude Code sessions.
Persistent memory for Claude Code, Codex CLI, Cursor and any MCP client. Local-first. Sub-millisecond recall. 96.2% R@5 on LongMemEval and #5 on the public LoCoMo leaderboard — verifiable, not press releases.
# Day 1 — Tuesday
You: remember we picked pgvector over ChromaDB
because of multi-tenant RLS.
Claude: ✓ memory_save(type="decision",
tags=["reusable", "db"])
# Day 4 — different session, different repo
You: why did we pick pgvector again?
Claude: ✓ memory_recall(query="vector db choice")
→ Chose pgvector over ChromaDB.
WHY: single Postgres, per-tenant RLS.
Tagged reusable. Created 2026-05-12.
# 0.065 ms · 0 LLM calls · 0 network $ npx -y total-agent-memory $ uvx total-agent-memory $ pipx install total-agent-memory $ brew install vbcherepanov/tap/total-memory $ curl -fsSL https://get.totalmemory.dev | sh $ docker run -p 37737:37737 -v ~/.tam:/data ghcr.io/vbcherepanov/total-agent-memory:12.0.0
No registration. No telemetry. MIT licensed. ~50 MB install.
35-second demo
You don't tell it "save this." It just does. Decisions, lessons, gotchas — captured in the background, recalled in future sessions across different repos.
memory_save(decision) without you typing anything.learn_error. Same trap won't fire twice.memory_recalls yesterday's decisions before answering.The problem
Yesterday's architectural decisions. Last week's bug fix. The reason you chose pgvector. The reason you didn't pick GraphQL. The stack you settled on at 2am after the third PoC. Close the terminal — and it all vanishes. Every new Claude session starts at zero, and you re-paste the same context. Forever.
22K+ tokens at 240 observations. Then your window dies. Then you delete half of it and lose what mattered.
Sends every decision, fix, file path through an API key. Latency, lock-in, and a privacy footprint you didn't sign up for.
The model edits its own context. Slow, expensive, non-deterministic, and the retrieval quality is whatever the model felt like that turn.
How it works
memory_core is deterministic — storage, embeddings, vector search, classifier, dedup, telemetry. No LLM imports allowed. Enforced by a unit test.
ai_layer is everything LLM-touching — summarisation, keyword extraction, contradiction detection, reflection, query rewriting. Runs async, off the hot path.
FTS5 + BM25 + dense + graph + CrossEncoder + MMR with RRF fusion. Tuned on real Claude Code sessions.
Nodes & edges auto-extracted on save. Time-aware queries: "what was our stack in Q3?" returns the right answer.
Recurring error patterns auto-consolidate into rules. Skills accumulate. Workflow predictor learns your loops.
memory_core is deterministic. ai_layer enriches async. Warm p50 = 0.065 ms. No surprise tokens, no latency tax.
Claude Code, Codex CLI, Cursor, Cline, Continue, Aider, Windsurf, Gemini CLI, OpenCode — wire one command each.
SQLite + FastEmbed on your machine. No cloud, no telemetry, no API key required. MIT licence.
Benchmarks
Every value on this page comes from a script in benchmarks/. For competitors we list only what their authors published — no invented numbers. LongMemEval is a 500-question recall benchmark from the Berkeley/Princeton paper.
tests/test_no_llm_hot_path.py.
Found a wrong number? Open an issue with a link to the source and we will re-run our eval and update this page within 48 hours. file an issue →
vs everyone else
Where a project doesn't list a number on a comparable benchmark, we mark it
—.
We don't invent values. Sources cited in
docs/vs-competitors.md.
| Project | LongMemEval R@5 | LoCoMo | p50 | LLM in hot path | Self-host | Deps | Notes |
|---|---|---|---|---|---|---|---|
| total-agent-memory | 96.2% | 0.705 | 0.065 ms | no | yes | SQLite + FastEmbed | #5 LoCoMo · local-first · MIT |
| agentmemory (iii) | 95.2% | — | — | no | yes | iii engine (ELv2, pre-1.0) | self-reported, not on public leaderboard |
| Letta / MemGPT | — | — | — | yes | optional | Postgres + vector DB | agent runtime · LLM in hot path |
| mem0 | — | published | — | yes | optional | Qdrant / pgvector | API-first · LLM extract on every save |
| Zep | — | — | — | yes | optional | Postgres + Graphiti | graph-first memory layer |
| CLAUDE.md | — | — | instant | no | yes | plain text | loads everything every turn · no retrieval |
Spot something wrong? open an issue with a link to the source — we'll re-run our eval and update.
Agents & IDEs
9 IDEs supported out of the box. The npm wrapper writes the MCP entry into the right config file and merges with anything you already have. No manual JSON. Restart the IDE — memory is live.
Anthropic
Official Anthropic CLI. 12 hooks fire automatically on file edits, prompts, session end.
~/.claude.json npx -y total-agent-memory connect claude-code OpenAI
OpenAI Codex CLI. TOML config patched, MCP server wired in one command.
~/.codex/config.toml npx -y total-agent-memory connect codex Cursor
AI-first IDE. Memory MCP appears in Cursor's server panel — works in chat and agent mode.
~/.cursor/mcp.json npx -y total-agent-memory connect cursor Cline
VSCode agent extension. Multi-turn plans use memory automatically once wired.
~/.cline/mcp.json npx -y total-agent-memory connect cline Continue.dev
Open-source IDE assistant. Memory is exposed as a tool via the MCP block in config.
~/.continue/config.json npx -y total-agent-memory connect continue Codeium
Codeium's agent-native IDE. Cascade picks up memory_recall automatically.
~/.codeium/windsurf/mcp_config.json npx -y total-agent-memory connect windsurf Google's open-source Gemini CLI. Memory drops in as a standard MCP server.
~/.gemini/settings.json npx -y total-agent-memory connect gemini-cli sst.dev
Open-source coding agent by sst. MCP-native — memory just works.
~/.config/opencode/config.json npx -y total-agent-memory connect opencode Aider
Terminal AI pair-programmer. Use the lookup-memory CLI side-by-side (MCP support coming).
~/.aider.conf.yml lookup-memory --help
If it speaks MCP, it works. Point any MCP client at total-agent-memory (in ~/.tam/.venv/bin/) and you're done.
Also exposes a REST endpoint on :37737. Drive it from scripts, CI, or any agent SDK that prefers HTTP over MCP.
A standalone lookup-memory binary is installed alongside the server for shell scripts and sub-agents.
Wire your IDE
The npm CLI writes the MCP server entry into your IDE's config and merges with anything already there. No manual JSON editing.
$ npx -y total-agent-memory connect claude-code
✓ Wired claude-code → ~/.claude.json
› Restart your IDE for the change to take effect.
$ npx -y total-agent-memory connect cursor
✓ Wired cursor → ~/.cursor/mcp.json
$ npx -y total-agent-memory connect codex
✓ Wired codex → ~/.codex/config.toml
$ npx -y total-agent-memory status
✓ python3 : Python 3.12.5
✓ uv : uv 0.8.17
✓ server : v12.0.0
✓ memory.db : 12.4 MB · 2,147 entries FAQ
Yes. MIT licence. No paid tier, no telemetry, no email signup. The entire server runs on your machine.
No. Memory lives in a local SQLite file. Embeddings are computed by FastEmbed locally. Ollama integration is optional and also local.
CLAUDE.md loads every byte every time. At ~240 observations it already exceeds 22K tokens and breaks long sessions. total-agent-memory retrieves only the relevant ~1.9K tokens per request. Same fidelity, far cheaper, far faster.
Two things. (1) Hot-path is LLM-free — sub-millisecond search. (2) Public benchmark numbers: 96.2% R@5 on LongMemEval, 0.705 on LoCoMo (top-5). Competitors that did not publish on the same benchmark are marked “—” in the comparison table. We don't invent numbers.
Yes — see docs/migration. We import via JSONL, preserve metadata, and re-embed locally on first save.
Roadmap. Today every install is per-machine. A shared backend (Postgres) is on the v12 roadmap.
Yes. The npm wrapper handles venv creation on Windows; the Python server is cross-platform. Install via npx -y total-agent-memory.
Yes. Every number on this page is reproducible — scripts live under benchmarks/. The LongMemEval result is in evals/longmemeval-2026-04-17.json. The LoCoMo leaderboard is public. If you can't reproduce a number, file an issue and we'll re-run.
total-agent-memory solved recall — sub-millisecond, 96.2% R@5, fully open-source. BrainCore solves the harder problem: stopping agents from confidently coding against a chunk that's no longer in the codebase. A strict-mode gate, a temporal decision graph, and the right to abstain — by design.
go.mod / package.json / language ASTs, not just embeddings. We're talking to early-stage investors who care about anti-hallucination infrastructure — the layer that makes AI coding agents trustworthy enough for regulated enterprises (fintech, healthtech, public sector, defence). total-agent-memory is the open-source recall primitive that's already shipped. BrainCore is the commercial cognition layer on top.
Not raising? Sign up for updates at getbraincore.com.
Support the project
total-agent-memory is free, MIT, and stays that way. Every star, every link, and every one-time tip keeps it that way — and frees me up to push v12 (shared-team backend) faster.
The single most useful thing you can do. Github stars feed every "trending memory MCP" list and ranking.
vbcherepanov/total-agent-memory →Paid for embeddings models, paid for GPU bench runs, paid for the domain. Every tip directly funds the next release.
paypal.me/VitaliiCherepanov →One retweet from someone in AI moves the needle more than a week of grinding. Tagged @BestProgerVR.
tweet a one-liner →No Stripe, no Patreon tier-system, no "premium memory features behind a paywall" — just a single PayPal link. If you'd rather invoice — email me.
30 seconds to install. Works with Claude Code, Cursor, Codex, and 6 more.