MemoryAtlas
About

A neutral, sourced map of agent memory.

Memory Atlas is a vendor-neutral catalog of LLM and agent memory frameworks. Every framework gets a standardized memory card. Every benchmark number sits next to the config it was run under and a link to where it came from. Model cards, but for memory systems.

Why it exists

The agent-memory field in 2026 is loud and hard to read. Every vendor publishes the benchmark that flatters them, so the same framework shows up at 49 and at 94 on LongMemEval depending on who measured it and how. A score means little without its backbone LLM, embedder, and retrieval settings, and most write-ups leave those out.

The benchmarks themselves are aging. LoCoMo and LongMemEval date to the 32K-context era, so a model that just dumps everything into the prompt can post a competitive number without having any memory at all. The good information is scattered across vendor blogs and PDFs, each with its own table and its own thumb on the scale. This catalog is the attempt at a neutral home for it.

How we keep it honest

Provenance on everything

Every fact and every number links to a source and carries a badge: self-reported or independently reproduced. A vendor's own blog number stays tagged as the vendor's, never laundered into something that looks neutral. The git history is the audit log.

Config-aware benchmarks

A result records the backbone LLM, the embedder, and the retrieval params it ran under. A score without that context isn't comparable to the next one, so we don't line them up as if it were.

Honest about benchmark rot

LoCoMo and LongMemEval were built for 32K-token windows. Today a model that just stuffs everything into context can score well without remembering anything. Each benchmark page shows that context-window baseline, so you can see how much the memory layer is actually adding.

Self-updating, with a human gate

Claude skills research the web, normalize what they find, and prepare changes against the git-tracked data. A person reviews and commits. The catalog mostly maintains itself, but nobody rubber-stamps a number into it unread.

What earns a place here

Memory Atlas catalogs memory layers: the part you add to an agent so it remembers across sessions. Not every tool that touches memory qualifies. Four rules decide whether something gets a card.

  1. 1

    It stores and retrieves long-term context across sessions.

    A plain vector database, a RAG how-to, or a weekend demo doesn't count. The card has to describe a real memory layer.

  2. 2

    You can adopt it independently.

    This is the rule that does the most work. The memory has to be usable inside an agent, harness, IDE, or chat client you already run, wired in over MCP, a REST API, an SDK, a CLI, or a framework adapter. If a product's memory only works inside that product, you can't bring it to your own stack, so it isn't catalogued here. The test is one question: can you use this memory in an agent you didn't build around it? That is why full agent runtimes like Letta and Hermes don't get a card. Their memory is part of the runtime; you adopt the whole agent or none of it. We keep an agent-runtime family page that explains this line rather than pretending the category isn't there.

  3. 3

    There's an open-source path.

    Every framework has a public repository. A hosted or paid tier on top of an open-source core is welcome. A closed-source, commercial-only product with no self-hostable path is not, however good it is.

  4. 4

    A paper is optional.

    Research projects and shipped products get the same treatment. We link a paper when one exists and never hold a card back for lacking one.

Start reading

39 frameworks catalogued, 35 sourced benchmark results, every claim with a link.