← All posts

Agent Memory Architecture Patterns: RAG vs Vault vs Hybrid

8 min read

Three approaches to persistent memory for AI agents, and how each trades off privacy, portability, and retrieval quality.

The Memory Problem Nobody Solved

Every AI agent framework has the same gap. The agent runs, it does useful work, it builds up context about the user and the task — and then the session ends and all of it evaporates. The next conversation starts from zero.

The industry's collective answer has been "just use RAG." Retrieval-augmented generation. Chunk your documents, embed them, store the vectors, search at query time. It works. For documents.

But agent memory isn't documents. An agent doesn't just need to search through files. It needs to remember that the user prefers TypeScript over Python. That the last deployment broke because of a timezone bug. That the team decided to use Postgres instead of MySQL three weeks ago, and why. These aren't chunks of a PDF — they're experiences, decisions, and learned behaviors.

The architecture you choose for storing and retrieving this kind of memory shapes everything about how your agent actually performs. Let's look at the three dominant patterns and where each one works.


Pattern 1: RAG (Retrieval-Augmented Generation)

The standard approach. Documents and conversation history get split into chunks, run through an embedding model, and stored in a vector database. At query time, the agent's prompt or question gets embedded, nearest neighbors are retrieved, and the results are injected into the context window.

How it works:

User query → Embed → Vector search → Top-K chunks → Inject into prompt → LLM generates

Tools: Pinecone, Weaviate, Chroma, pgvector, Qdrant, and dozens of others.

Where it shines:

  • Large document corpora (knowledge bases, support docs, codebases)
  • When the "memory" is really a search index over external content
  • Workloads where the source material is relatively static
  • Teams that already have embeddings infrastructure

Where it breaks down:

RAG treats everything as a document retrieval problem. That works until your agent needs to remember things that aren't documents.

Consider: an agent has a conversation where the user says "I hate when you suggest Python — I only use TypeScript." In a RAG system, you'd need to extract that preference, create a synthetic document for it, embed it, and hope the vector search surfaces it when relevant. Maybe it does. Maybe the embedding of "What language should I use for this script?" isn't close enough to "I hate Python" in vector space. The semantic relationship is there, but it's indirect.

RAG also has a freshness problem. If you embed conversation history, the vector store fills up with thousands of low-signal exchanges. Retrieval quality degrades because the signal-to-noise ratio drops. You end up building pruning pipelines, relevance scoring layers, and re-ranking systems on top of the vector search. The simple architecture gets complicated fast.

And there's the privacy question. Your user's conversations and preferences live in a vector database, usually unencrypted, often hosted by a third party. The vectors themselves can be inverted to approximate the original text. For agents that handle sensitive information — personal finances, medical data, business strategy — this is a real concern, not a theoretical one.


Pattern 2: Memory Vault (Structured, Encrypted)

A different premise: agent memory isn't search, it's storage. Memories are discrete, typed records — a preference, a decision, a learned behavior, a fact — stored in an encrypted vault that the agent controls. Retrieval uses a combination of semantic search, type filtering, and temporal context rather than pure vector similarity.

How it works:

Agent observes → Classify memory type → Encrypt client-side → Store in vault → 
Retrieve by type + semantic relevance + recency → Decrypt → Inject into context

Tools: Imprint, Mem0, Zep (partial), custom implementations.

Where it shines:

  • Personal agents that learn about individual users over time
  • Privacy-sensitive workloads where encryption isn't optional
  • Cross-session continuity (the agent picks up where it left off)
  • Multi-provider setups where the agent might run on Claude today and GPT tomorrow
  • Memories that are behavioral, not documentary (preferences, patterns, decisions)

Where it breaks down:

Vaults are designed for agent-generated memories, not bulk document storage. If you need to search through 50,000 pages of documentation, a vault isn't the right tool. It's built for hundreds to low thousands of curated, high-signal memories — not corpus-scale retrieval.

The structured approach also means something has to classify and curate the memories. The agent itself, usually. That's an extra step in the pipeline, and the quality of memory storage depends on how well the agent decides what's worth remembering and how to categorize it. A bad classification means the memory exists but never gets retrieved when it's needed.

There's also the cold-start problem. A new vault is empty. The agent has to build up its memory over time through interaction. RAG can be pre-loaded with an entire knowledge base on day one. A vault starts blank.


Pattern 3: Hybrid (RAG + Vault)

Use both. RAG handles document retrieval and knowledge base search. The vault handles personal memory, preferences, and learned behaviors. At query time, both systems contribute to the context window with their respective strengths.

How it works:

User query → 
  ├── Vector search (documents, knowledge base) → Top-K chunks
  └── Vault retrieval (preferences, decisions, history) → Relevant memories
→ Merge + rank → Inject into context → LLM generates

Where it shines:

This is where most serious agent deployments end up. The agent needs access to documentation (RAG) and also needs to remember who it's talking to and what it's learned (vault). Neither system alone covers both needs well.

A customer support agent, for example, needs RAG to search the product documentation and troubleshooting guides. But it also needs to remember that this particular customer has called three times about the same issue, that they're frustrated, that they were promised a callback, and that the last agent suggested a workaround that didn't work. The documentation is RAG territory. The customer relationship is vault territory.

Where it breaks down:

Complexity. You're running two retrieval systems, merging their results, and managing the relevance ranking between two fundamentally different types of content. The architectural overhead is real. For a prototype or a simple agent, this is over-engineering.

There's also a risk of retrieval collision — the RAG system and the vault both return content about the same topic, but they disagree. The documentation says one thing; the agent's memory of what actually worked says another. You need a resolution strategy, which usually means more prompt engineering or a separate ranking model.


The Decision Framework

Picking the right pattern depends on what your agent actually does, not on what's architecturally fashionable.

Use RAG when:

  • Your agent's primary job is answering questions from a document corpus
  • The information is relatively static (updated weekly/monthly, not per-conversation)
  • You're building a chatbot over existing content, not a learning agent
  • Privacy requirements are standard (not handling PII or sensitive personal data)

Use a Vault when:

  • Your agent maintains long-term relationships with individual users
  • Cross-session continuity is a core feature, not a nice-to-have
  • The agent needs to learn and adapt its behavior over time
  • Privacy and data sovereignty matter (healthcare, finance, personal assistants)
  • You're building across multiple LLM providers and need portable memory

Use Hybrid when:

  • Your agent needs both document knowledge and personal memory
  • You're building a production system with real users who come back repeatedly
  • You can afford the architectural complexity (you have a team, not just a weekend project)

What We Got Wrong Initially

When we started building Imprint, we tried to make the vault do everything. Document storage, conversation history, semantic search over large corpora — all encrypted, all structured. It was architecturally elegant and practically unusable. Retrieval over thousands of encrypted records was slow. The type system was too rigid for unstructured document content. We were building a worse vector database with an encryption layer bolted on.

The breakthrough was accepting that vault memory and document retrieval are fundamentally different problems. A vault stores what the agent learned. RAG searches what the agent knows about. Those sound similar, but the access patterns, storage characteristics, and privacy requirements are different enough that one system can't serve both well.

Now Imprint does one thing: encrypted, structured, portable agent memory. If you need RAG, use a vector database. If you need both, use both. The vault handles the personal, private, behavioral memory that RAG was never designed for.


Implementation Considerations

A few things that don't fit neatly into the pattern comparison but matter when you're actually building:

Memory curation is the hard part. Regardless of pattern, the quality of what goes in determines the quality of what comes out. An agent that stores every conversation turn will drown in noise. An agent that stores nothing will never learn. The curation logic — what to remember, how to categorize it, when to update or retire old memories — is where the real engineering challenge lives.

Encryption changes the retrieval game. If memories are encrypted at rest, you can't do server-side vector search over them. Imprint solves this with client-side embedding and encrypted indexes, but the approach has real tradeoffs in query latency and index size. If privacy isn't a hard requirement, unencrypted vector search will always be faster.

Portability matters more than you think. If your agent's memory is locked into one provider's embedding format, switching providers means re-embedding everything or maintaining parallel indexes. Vaults with provider-agnostic storage avoid this, but it's a constraint you should evaluate early.

Temporal context is underrated. When did the agent learn something? Is it still relevant? A preference from six months ago might be outdated. A decision from yesterday is probably current. Both RAG and vault systems need some notion of temporal relevance, and most implementations handle this poorly or not at all.


Where This Is Going

The current split between RAG and vault memory feels temporary. The retrieval and curation layers will likely converge as the tooling matures. What probably persists is the underlying distinction: there's memory the agent searches (knowledge) and memory the agent embodies (experience). Different storage, different retrieval, different privacy requirements.

For now, pick the pattern that matches your agent's actual workload. Don't use RAG for personal memory because it's familiar. Don't use a vault for document search because encryption sounds impressive. Match the tool to the problem.

If your agent needs to learn about individual users, retain context across sessions, and keep that data private, Imprint was built for exactly that. The SDK quickstart takes about fifteen minutes, and the free tier gives you room to experiment.


Imprint is an encrypted memory vault for AI agents. Client-side encryption, semantic retrieval, Merkle-verified integrity. Your agent's memories stay yours.

agent-memory architecture rag