Skip to content
v12.0.0 · #5 on LoCoMo · 1200+ tests passing

Memory your agent actually keeps.

Persistent memory for Claude Code, Codex CLI, Cursor and any MCP client. Local-first. Sub-millisecond recall. 96.2% R@5 on LongMemEval and #5 on the public LoCoMo leaderboard — verifiable, not press releases.

  • 96.2 %
    LongMemEval R@5
  • 0.705 SOTA
    LoCoMo accuracy
  • 0.065 ms
    Warm p50 latency
  • 100 %
    Local
~ — claude-code
# Day 1 — Tuesday
You:     remember we picked pgvector over ChromaDB
         because of multi-tenant RLS.
Claude:memory_save(type="decision",
                       tags=["reusable", "db"])

# Day 4 — different session, different repo
You:     why did we pick pgvector again?
Claude:memory_recall(query="vector db choice")
         → Chose pgvector over ChromaDB.
           WHY: single Postgres, per-tenant RLS.
           Tagged reusable. Created 2026-05-12.

# 0.065 ms · 0 LLM calls · 0 network
claude-code codex cursor cline continue windsurf gemini-cli opencode aider
$ npx -y total-agent-memory
no install — runs immediately, no Node packages added globally

No registration. No telemetry. MIT licensed. ~50 MB install.

35-second demo

The agent remembers on its own.

You don't tell it "save this." It just does. Decisions, lessons, gotchas — captured in the background, recalled in future sessions across different repos.

  • Auto-captures decisions
    Architectural choice → memory_save(decision) without you typing anything.
  • Auto-learns from errors
    Test fails → diagnosis → fix → learn_error. Same trap won't fire twice.
  • Auto-recalls across repos
    Different project, fresh session → agent memory_recalls yesterday's decisions before answering.
tam-demo.mp4 · 1920×1080 · 35s · CC-BY 4.0

The problem

AI coding agents have goldfish memory.

Yesterday's architectural decisions. Last week's bug fix. The reason you chose pgvector. The reason you didn't pick GraphQL. The stack you settled on at 2am after the third PoC. Close the terminal — and it all vanishes. Every new Claude session starts at zero, and you re-paste the same context. Forever.

CLAUDE.md
Loads everything, every time

22K+ tokens at 240 observations. Then your window dies. Then you delete half of it and lose what mattered.

Cloud memory APIs
Your code, their database

Sends every decision, fix, file path through an API key. Latency, lock-in, and a privacy footprint you didn't sign up for.

Agent-managed memory
LLM in the hot path

The model edits its own context. Slow, expensive, non-deterministic, and the retrieval quality is whatever the model felt like that turn.

How it works

Two-layer architecture. The fast one is the default.

memory_core is deterministic — storage, embeddings, vector search, classifier, dedup, telemetry. No LLM imports allowed. Enforced by a unit test.
ai_layer is everything LLM-touching — summarisation, keyword extraction, contradiction detection, reflection, query rewriting. Runs async, off the hot path.

6-stage hybrid retrieval

FTS5 + BM25 + dense + graph + CrossEncoder + MMR with RRF fusion. Tuned on real Claude Code sessions.

Temporal knowledge graph

Nodes & edges auto-extracted on save. Time-aware queries: "what was our stack in Q3?" returns the right answer.

Procedural memory

Recurring error patterns auto-consolidate into rules. Skills accumulate. Workflow predictor learns your loops.

Hot path: zero LLM

memory_core is deterministic. ai_layer enriches async. Warm p50 = 0.065 ms. No surprise tokens, no latency tax.

9 IDEs supported

Claude Code, Codex CLI, Cursor, Cline, Continue, Aider, Windsurf, Gemini CLI, OpenCode — wire one command each.

Local-first

SQLite + FastEmbed on your machine. No cloud, no telemetry, no API key required. MIT licence.

Benchmarks

Numbers we can actually reproduce.

Every value on this page comes from a script in benchmarks/. For competitors we list only what their authors published — no invented numbers. LongMemEval is a 500-question recall benchmark from the Berkeley/Princeton paper.

LongMemEval R@5
higher is better · same eval, same prompts
raw json →
total-agent-memory 96.2% agentmemory (iii) 95.2% BM25-only baseline 86.2%
No published LongMemEval-S number — we won't invent one:
mem0Letta / MemGPTZepSupermemory
Warm latency · search hot path
in-memory SQLite · MacBook M-series
p50 0.065 ms
p95 4.0 ms
p99 6.2 ms
252× faster than v10.5 (with LLM in path). 2.7× faster than v10.5 without LLM. Zero LLM calls, zero network requests on the save/search/recall hot path — verified by tests/test_no_llm_hot_path.py.

Found a wrong number? Open an issue with a link to the source and we will re-run our eval and update this page within 48 hours. file an issue →

vs everyone else

Honest comparison. Empty cells mean nothing was published.

Where a project doesn't list a number on a comparable benchmark, we mark it . We don't invent values. Sources cited in docs/vs-competitors.md.

Project LongMemEval R@5 LoCoMo p50 LLM in hot path Self-host Deps Notes
total-agent-memory 96.2% 0.705 0.065 ms no yes SQLite + FastEmbed #5 LoCoMo · local-first · MIT
agentmemory (iii) 95.2% no yes iii engine (ELv2, pre-1.0) self-reported, not on public leaderboard
Letta / MemGPT yes optional Postgres + vector DB agent runtime · LLM in hot path
mem0 published yes optional Qdrant / pgvector API-first · LLM extract on every save
Zep yes optional Postgres + Graphiti graph-first memory layer
CLAUDE.md instant no yes plain text loads everything every turn · no retrieval

Spot something wrong? open an issue with a link to the source — we'll re-run our eval and update.

Agents & IDEs

Works with every coding agent you already use.

9 IDEs supported out of the box. The npm wrapper writes the MCP entry into the right config file and merges with anything you already have. No manual JSON. Restart the IDE — memory is live.

Claude Code

Anthropic

claude-code

Official Anthropic CLI. 12 hooks fire automatically on file edits, prompts, session end.

~/.claude.json
$ npx -y total-agent-memory connect claude-code

Codex CLI

OpenAI

codex

OpenAI Codex CLI. TOML config patched, MCP server wired in one command.

~/.codex/config.toml
$ npx -y total-agent-memory connect codex

Cursor

Cursor

cursor

AI-first IDE. Memory MCP appears in Cursor's server panel — works in chat and agent mode.

~/.cursor/mcp.json
$ npx -y total-agent-memory connect cursor

Cline

Cline

cline

VSCode agent extension. Multi-turn plans use memory automatically once wired.

~/.cline/mcp.json
$ npx -y total-agent-memory connect cline

Continue

Continue.dev

continue

Open-source IDE assistant. Memory is exposed as a tool via the MCP block in config.

~/.continue/config.json
$ npx -y total-agent-memory connect continue

Windsurf

Codeium

windsurf

Codeium's agent-native IDE. Cascade picks up memory_recall automatically.

~/.codeium/windsurf/mcp_config.json
$ npx -y total-agent-memory connect windsurf

Gemini CLI

Google

gemini-cli

Google's open-source Gemini CLI. Memory drops in as a standard MCP server.

~/.gemini/settings.json
$ npx -y total-agent-memory connect gemini-cli

OpenCode

sst.dev

opencode

Open-source coding agent by sst. MCP-native — memory just works.

~/.config/opencode/config.json
$ npx -y total-agent-memory connect opencode

Aider

Aider

aider

Terminal AI pair-programmer. Use the lookup-memory CLI side-by-side (MCP support coming).

~/.aider.conf.yml
$ lookup-memory --help
Don't see your agent?

If it speaks MCP, it works. Point any MCP client at total-agent-memory (in ~/.tam/.venv/bin/) and you're done.

REST API

Also exposes a REST endpoint on :37737. Drive it from scripts, CI, or any agent SDK that prefers HTTP over MCP.

CLI tool

A standalone lookup-memory binary is installed alongside the server for shell scripts and sub-agents.

Wire your IDE

One command per editor.

The npm CLI writes the MCP server entry into your IDE's config and merges with anything already there. No manual JSON editing.

  • claude-code
  • codex
  • cursor
  • cline
  • continue
  • aider
  • windsurf
  • gemini-cli
  • opencode
terminal
$ npx -y total-agent-memory connect claude-code
 Wired claude-code → ~/.claude.json
 Restart your IDE for the change to take effect.

$ npx -y total-agent-memory connect cursor
 Wired cursor → ~/.cursor/mcp.json

$ npx -y total-agent-memory connect codex
 Wired codex → ~/.codex/config.toml

$ npx -y total-agent-memory status
 python3   : Python 3.12.5
 uv        : uv 0.8.17
 server    : v12.0.0
 memory.db : 12.4 MB · 2,147 entries

FAQ

Questions you'd ask in an interview.

Is it really free?

Yes. MIT licence. No paid tier, no telemetry, no email signup. The entire server runs on your machine.

Does any data leave my machine?

No. Memory lives in a local SQLite file. Embeddings are computed by FastEmbed locally. Ollama integration is optional and also local.

Why is this better than just using CLAUDE.md?

CLAUDE.md loads every byte every time. At ~240 observations it already exceeds 22K tokens and breaks long sessions. total-agent-memory retrieves only the relevant ~1.9K tokens per request. Same fidelity, far cheaper, far faster.

How does it compare to mem0 / Letta / Zep / Supermemory?

Two things. (1) Hot-path is LLM-free — sub-millisecond search. (2) Public benchmark numbers: 96.2% R@5 on LongMemEval, 0.705 on LoCoMo (top-5). Competitors that did not publish on the same benchmark are marked “—” in the comparison table. We don't invent numbers.

Can I migrate from mem0?

Yes — see docs/migration. We import via JSONL, preserve metadata, and re-embed locally on first save.

Does it support team sharing?

Roadmap. Today every install is per-machine. A shared backend (Postgres) is on the v12 roadmap.

Does it work on Windows?

Yes. The npm wrapper handles venv creation on Windows; the Python server is cross-platform. Install via npx -y total-agent-memory.

Are the benchmark numbers verifiable?

Yes. Every number on this page is reproducible — scripts live under benchmarks/. The LongMemEval result is in evals/longmemeval-2026-04-17.json. The LoCoMo leaderboard is public. If you can't reproduce a number, file an issue and we'll re-run.

What we're building next · private beta

BrainCore — the memory layer that lets your agent say "I don't know."

total-agent-memory solved recall — sub-millisecond, 96.2% R@5, fully open-source. BrainCore solves the harder problem: stopping agents from confidently coding against a chunk that's no longer in the codebase. A strict-mode gate, a temporal decision graph, and the right to abstain — by design.

  • Strict mode by default. Every fact passes a gate before it touches your prompt. Fails → abstain.
  • Decision graph with provenance. problem → choice → outcome, every node lifecycle-tracked.
  • AST source-truth. Reads go.mod / package.json / language ASTs, not just embeddings.
  • Local-first, Go-native. Designed for the regulated enterprise — code never leaves the perimeter.
Now raising — seed
Building a cognitive memory engine

We're talking to early-stage investors who care about anti-hallucination infrastructure — the layer that makes AI coding agents trustworthy enough for regulated enterprises (fintech, healthtech, public sector, defence). total-agent-memory is the open-source recall primitive that's already shipped. BrainCore is the commercial cognition layer on top.

Tech metric
0.95
strict-mode R@5
Retrieval p95
4 ms
in-process Go
Stack
Go 1.25
pgvector · MCP stdio
Status
private beta
Apache-2.0 core
Request the deck

Not raising? Sign up for updates at getbraincore.com.

Support the project

Self-funded. Local-first. No VC strings.

total-agent-memory is free, MIT, and stays that way. Every star, every link, and every one-time tip keeps it that way — and frees me up to push v12 (shared-team backend) faster.

No Stripe, no Patreon tier-system, no "premium memory features behind a paywall" — just a single PayPal link. If you'd rather invoice — email me.

Stop re-explaining yourself.

30 seconds to install. Works with Claude Code, Cursor, Codex, and 6 more.