LLMs are stateless. Every request starts from zero. Your agent doesn't "remember" yesterday — it reads about yesterday from files you designed. Building a memory system that actually works is half engineering, half information architecture.
The Simplest Thing That Works: Markdown Files
We started with the most boring technology possible: markdown files in a directory. No vector database, no graph store, no fancy retrieval-augmented generation pipeline. Just files the agent can read and write.
Here's the structure I landed on:
MEMORY.md— Long-term curated memory. The distilled essence of everything important.memory/active-tasks.md— What's in progress right now. The agent reads this first on every restart.memory/lessons.md— Every mistake, documented once, never repeated.memory/projects.md— Current state of every project.memory/self-review.md— Self-critique log from periodic reviews.
Why markdown? Because it's human-readable, human-editable, and diffs beautifully in git. I can review what the agent "remembers" the same way we review code.
Adding Semantic Search
Flat files work great when you know what you're looking for. They break down when the agent needs to answer "have I seen this before?" across hundreds of entries. That's where embeddings come in.
We added an embedding-based semantic search layer on top of the markdown files. When the agent needs to recall something, it doesn't grep through every file — it searches by meaning. "What do I know about audio pipelines?" finds relevant entries even if they don't contain those exact words.
The embedding model is lightweight (OpenAI's text-embedding-3-small), and the index rebuilds automatically. The source of truth is still the markdown files — the embeddings are just an index.
Session Transcripts as Ground Truth
Every conversation is stored as a JSONL transcript. These are the raw, unedited record of what happened. The agent doesn't read these directly during normal operation (too large, too noisy), but they're searchable through the semantic index.
This means the agent can recall specific conversations: "What did I decided about the API rate limiting last week?" It searches the transcripts, finds the relevant exchange, and pulls the context. No manual logging required.
The Write-It-Down Rule
The single most important rule in my memory system: if it matters, write it to a file. The agent doesn't get to keep "mental notes." There are no mental notes — there's only what's on disk.
This sounds obvious, but it's surprisingly easy to get wrong. The agent acknowledges a preference in conversation ("Got it, you prefer Tailscale over WireGuard") but doesn't write it down. Next session? Gone. The rule is: every decision, every preference, every credential — committed to the appropriate file immediately.
Memory Maintenance
Memory without maintenance is hoarding. We run periodic cleanup through the sleep cycle and self-review processes:
- Completed tasks move out of active-tasks.md
- Stale project entries get updated or archived
- MEMORY.md gets pruned — outdated info removed, similar entries merged
- Lessons that no longer apply get retired
The goal isn't total recall — it's relevant recall. An agent that remembers everything is just as useless as one that remembers nothing, because it can't tell what matters.
What We Learned
The biggest insight: memory architecture matters more than memory technology. A well-organized set of markdown files with clear conventions beats a sophisticated vector database with no structure. The agent needs to know where to write things just as much as it needs the ability to search for them.
Start simple. Add complexity only when the simple thing fails. And always, always make memory human-auditable.