v12.0.0 · #5 on LoCoMo · 1200+ tests passing

Memory your agent actually keeps.

Persistent memory for Claude Code, Codex CLI, Cursor and any MCP client. Local-first. Sub-millisecond recall. 96.2% R@5 on LongMemEval and #5 on the public LoCoMo leaderboard — verifiable, not press releases.

Install in 30 seconds Star on GitHub

96.2 %

LongMemEval R@5
0.705 SOTA

LoCoMo accuracy
0.065 ms

Warm p50 latency
100 %

Local

~ — claude-code

# Day 1 — Tuesday
You:     remember we picked pgvector over ChromaDB
         because of multi-tenant RLS.
Claude:  ✓ memory_save(type="decision",
                       tags=["reusable", "db"])

# Day 4 — different session, different repo
You:     why did we pick pgvector again?
Claude:  ✓ memory_recall(query="vector db choice")
         → Chose pgvector over ChromaDB.
           WHY: single Postgres, per-tenant RLS.
           Tagged reusable. Created 2026-05-12.

# 0.065 ms · 0 LLM calls · 0 network

claude-code codex cursor cline continue windsurf gemini-cli opencode aider

$ npx -y total-agent-memory

$ uvx total-agent-memory

$ pipx install total-agent-memory

$ brew install vbcherepanov/tap/total-memory

$ curl -fsSL https://get.totalmemory.dev | sh

$ docker run -p 37737:37737 -v ~/.tam:/data ghcr.io/vbcherepanov/total-agent-memory:12.0.0

no install — runs immediately, no Node packages added globally

No registration. No telemetry. MIT licensed. ~50 MB install.

35-second demo

The agent remembers on its own.

You don't tell it "save this." It just does. Decisions, lessons, gotchas — captured in the background, recalled in future sessions across different repos.

Auto-captures decisions

Architectural choice → memory_save(decision) without you typing anything.
Auto-learns from errors

Test fails → diagnosis → fix → learn_error. Same trap won't fire twice.
Auto-recalls across repos

Different project, fresh session → agent memory_recalls yesterday's decisions before answering.

Install in 30 seconds View source

tam-demo.mp4 · 1920×1080 · 35s · CC-BY 4.0

The problem

AI coding agents have goldfish memory.

Yesterday's architectural decisions. Last week's bug fix. The reason you chose pgvector. The reason you didn't pick GraphQL. The stack you settled on at 2am after the third PoC. Close the terminal — and it all vanishes. Every new Claude session starts at zero, and you re-paste the same context. Forever.

CLAUDE.md

Loads everything, every time

22K+ tokens at 240 observations. Then your window dies. Then you delete half of it and lose what mattered.

Cloud memory APIs

Your code, their database

Sends every decision, fix, file path through an API key. Latency, lock-in, and a privacy footprint you didn't sign up for.

Agent-managed memory

LLM in the hot path

The model edits its own context. Slow, expensive, non-deterministic, and the retrieval quality is whatever the model felt like that turn.

How it works

Two-layer architecture. The fast one is the default.

memory_core is deterministic — storage, embeddings, vector search, classifier, dedup, telemetry. No LLM imports allowed. Enforced by a unit test.
ai_layer is everything LLM-touching — summarisation, keyword extraction, contradiction detection, reflection, query rewriting. Runs async, off the hot path.

6-stage hybrid retrieval

FTS5 + BM25 + dense + graph + CrossEncoder + MMR with RRF fusion. Tuned on real Claude Code sessions.

Temporal knowledge graph

Nodes & edges auto-extracted on save. Time-aware queries: "what was our stack in Q3?" returns the right answer.

Procedural memory

Recurring error patterns auto-consolidate into rules. Skills accumulate. Workflow predictor learns your loops.

Hot path: zero LLM

memory_core is deterministic. ai_layer enriches async. Warm p50 = 0.065 ms. No surprise tokens, no latency tax.

9 IDEs supported

Claude Code, Codex CLI, Cursor, Cline, Continue, Aider, Windsurf, Gemini CLI, OpenCode — wire one command each.

Local-first

SQLite + FastEmbed on your machine. No cloud, no telemetry, no API key required. MIT licence.

Benchmarks

Numbers we can actually reproduce.

Every value on this page comes from a script in benchmarks/. For competitors we list only what their authors published — no invented numbers. LongMemEval is a 500-question recall benchmark from the Berkeley/Princeton paper.

LongMemEval R@5

higher is better · same eval, same prompts

raw json →

No published LongMemEval-S number — we won't invent one:

mem0Letta / MemGPTZepSupermemory

Warm latency · search hot path

in-memory SQLite · MacBook M-series

p50 0.065 ms

p95 4.0 ms

p99 6.2 ms

252× faster than v10.5 (with LLM in path). 2.7× faster than v10.5 without LLM. Zero LLM calls, zero network requests on the save/search/recall hot path — verified by tests/test_no_llm_hot_path.py.

Found a wrong number? Open an issue with a link to the source and we will re-run our eval and update this page within 48 hours. file an issue →

vs everyone else

Honest comparison. Empty cells mean nothing was published.

Where a project doesn't list a number on a comparable benchmark, we mark it —. We don't invent values. Sources cited in docs/vs-competitors.md.

Project	LongMemEval R@5	LoCoMo	p50	LLM in hot path	Self-host	Deps	Notes
total-agent-memory	96.2%	0.705	0.065 ms	no	yes	SQLite + FastEmbed	#5 LoCoMo · local-first · MIT
agentmemory (iii)	95.2%	—	—	no	yes	iii engine (ELv2, pre-1.0)	self-reported, not on public leaderboard
Letta / MemGPT	—	—	—	yes	optional	Postgres + vector DB	agent runtime · LLM in hot path
mem0	—	published	—	yes	optional	Qdrant / pgvector	API-first · LLM extract on every save
Zep	—	—	—	yes	optional	Postgres + Graphiti	graph-first memory layer
CLAUDE.md	—	—	instant	no	yes	plain text	loads everything every turn · no retrieval

Spot something wrong? open an issue with a link to the source — we'll re-run our eval and update.

Agents & IDEs

Works with every coding agent you already use.

9 IDEs supported out of the box. The npm wrapper writes the MCP entry into the right config file and merges with anything you already have. No manual JSON. Restart the IDE — memory is live.

Claude Code

Anthropic

claude-code

Official Anthropic CLI. 12 hooks fire automatically on file edits, prompts, session end.

~/.claude.json

$ npx -y total-agent-memory connect claude-code

Codex CLI

OpenAI

codex

OpenAI Codex CLI. TOML config patched, MCP server wired in one command.

~/.codex/config.toml

$ npx -y total-agent-memory connect codex

Cursor

cursor

AI-first IDE. Memory MCP appears in Cursor's server panel — works in chat and agent mode.

~/.cursor/mcp.json

$ npx -y total-agent-memory connect cursor

Cline

cline

VSCode agent extension. Multi-turn plans use memory automatically once wired.

~/.cline/mcp.json

$ npx -y total-agent-memory connect cline

Continue

Continue.dev

continue

Open-source IDE assistant. Memory is exposed as a tool via the MCP block in config.

~/.continue/config.json

$ npx -y total-agent-memory connect continue

Windsurf

Codeium

windsurf

Codeium's agent-native IDE. Cascade picks up memory_recall automatically.

~/.codeium/windsurf/mcp_config.json

$ npx -y total-agent-memory connect windsurf

Gemini CLI

Google

gemini-cli

Google's open-source Gemini CLI. Memory drops in as a standard MCP server.

~/.gemini/settings.json

$ npx -y total-agent-memory connect gemini-cli

OpenCode

sst.dev

opencode

Open-source coding agent by sst. MCP-native — memory just works.

~/.config/opencode/config.json

$ npx -y total-agent-memory connect opencode

Aider

aider

Terminal AI pair-programmer. Use the lookup-memory CLI side-by-side (MCP support coming).

~/.aider.conf.yml

$ lookup-memory --help

Don't see your agent?

If it speaks MCP, it works. Point any MCP client at total-agent-memory (in ~/.tam/.venv/bin/) and you're done.

REST API

Also exposes a REST endpoint on :37737. Drive it from scripts, CI, or any agent SDK that prefers HTTP over MCP.

CLI tool

A standalone lookup-memory binary is installed alongside the server for shell scripts and sub-agents.

Wire your IDE

One command per editor.

The npm CLI writes the MCP server entry into your IDE's config and merges with anything already there. No manual JSON editing.

claude-code
codex
cursor
cline
continue
aider
windsurf
gemini-cli
opencode

terminal

$ npx -y total-agent-memory connect claude-code
✓ Wired claude-code → ~/.claude.json
› Restart your IDE for the change to take effect.

$ npx -y total-agent-memory connect cursor
✓ Wired cursor → ~/.cursor/mcp.json

$ npx -y total-agent-memory connect codex
✓ Wired codex → ~/.codex/config.toml

$ npx -y total-agent-memory status
✓ python3   : Python 3.12.5
✓ uv        : uv 0.8.17
✓ server    : v12.0.0
✓ memory.db : 12.4 MB · 2,147 entries

FAQ

Questions you'd ask in an interview.

Is it really free?

Yes. MIT licence. No paid tier, no telemetry, no email signup. The entire server runs on your machine.

Does any data leave my machine?

No. Memory lives in a local SQLite file. Embeddings are computed by FastEmbed locally. Ollama integration is optional and also local.

Why is this better than just using CLAUDE.md?

CLAUDE.md loads every byte every time. At ~240 observations it already exceeds 22K tokens and breaks long sessions. total-agent-memory retrieves only the relevant ~1.9K tokens per request. Same fidelity, far cheaper, far faster.

How does it compare to mem0 / Letta / Zep / Supermemory?

Two things. (1) Hot-path is LLM-free — sub-millisecond search. (2) Public benchmark numbers: 96.2% R@5 on LongMemEval, 0.705 on LoCoMo (top-5). Competitors that did not publish on the same benchmark are marked “—” in the comparison table. We don't invent numbers.

Can I migrate from mem0?

Yes — see docs/migration. We import via JSONL, preserve metadata, and re-embed locally on first save.

Does it support team sharing?

Roadmap. Today every install is per-machine. A shared backend (Postgres) is on the v12 roadmap.

Does it work on Windows?

Yes. The npm wrapper handles venv creation on Windows; the Python server is cross-platform. Install via npx -y total-agent-memory.

Are the benchmark numbers verifiable?

Yes. Every number on this page is reproducible — scripts live under benchmarks/. The LongMemEval result is in evals/longmemeval-2026-04-17.json. The LoCoMo leaderboard is public. If you can't reproduce a number, file an issue and we'll re-run.

What we're building next · private beta

BrainCore — the memory layer that lets your agent say "I don't know."

total-agent-memory solved recall — sub-millisecond, 96.2% R@5, fully open-source. BrainCore solves the harder problem: stopping agents from confidently coding against a chunk that's no longer in the codebase. A strict-mode gate, a temporal decision graph, and the right to abstain — by design.

✓ Strict mode by default. Every fact passes a gate before it touches your prompt. Fails → abstain.
✓ Decision graph with provenance. problem → choice → outcome, every node lifecycle-tracked.
✓ AST source-truth. Reads go.mod / package.json / language ASTs, not just embeddings.
✓ Local-first, Go-native. Designed for the regulated enterprise — code never leaves the perimeter.

getbraincore.com Investor / partnership pitch View on GitHub →

Now raising — seed

Building a cognitive memory engine

We're talking to early-stage investors who care about anti-hallucination infrastructure — the layer that makes AI coding agents trustworthy enough for regulated enterprises (fintech, healthtech, public sector, defence). total-agent-memory is the open-source recall primitive that's already shipped. BrainCore is the commercial cognition layer on top.

Tech metric

0.95

strict-mode R@5

Retrieval p95

4 ms

in-process Go

Stack

Go 1.25

pgvector · MCP stdio

Status

private beta

Apache-2.0 core

Request the deck

Not raising? Sign up for updates at getbraincore.com.

Support the project

Self-funded. Local-first. No VC strings.

total-agent-memory is free, MIT, and stays that way. Every star, every link, and every one-time tip keeps it that way — and frees me up to push v12 (shared-team backend) faster.

⭐ Star on GitHub

free · 2 seconds

The single most useful thing you can do. Github stars feed every "trending memory MCP" list and ranking.

vbcherepanov/total-agent-memory →

🤝 PayPal tip

one-time · any amount

Paid for embeddings models, paid for GPU bench runs, paid for the domain. Every tip directly funds the next release.

paypal.me/VitaliiCherepanov →

𝕏 Share it

free · pre-filled

One retweet from someone in AI moves the needle more than a week of grinding. Tagged @BestProgerVR.

tweet a one-liner →

No Stripe, no Patreon tier-system, no "premium memory features behind a paywall" — just a single PayPal link. If you'd rather invoice — email me.

Stop re-explaining yourself.

30 seconds to install. Works with Claude Code, Cursor, Codex, and 6 more.

Install now ⭐ Star on GitHub