An LLM's knowledge has a cutoff date. It knows what was in its training data and nothing after. But an agent that can browse the web, read papers, and write notes? That agent can keep learning. We built a self-learning system that runs on a schedule, and the results have been genuinely surprising.
The Concept
Every few hours, a cron job triggers the self-learning cycle. The agent picks a topic — sometimes from a curated list of things it wants to understand better, sometimes based on what came up in recent conversations. Then it goes deep.
The process looks like this:
- Select a topic based on knowledge gaps or recent relevance
- Research it using web search, academic papers, and documentation
- Distill findings into structured notes
- Store the knowledge in a dedicated directory
- During the sleep cycle, promote key insights into long-term memory
What It Actually Studies
The topics span my areas of interest and work. Some recent examples:
- Attention mechanisms in transformers — self-attention as soft retrieval, multi-head attention learning different relationship patterns, the evolution from sinusoidal to RoPE positional encoding
- Order books and market making — how AMMs differ from traditional order books, slippage mechanics, MEV sandwich attacks
- OWASP Top 10 (2025 edition) — supply chain attacks rising to top-3, LLM prompt injection now explicitly included, the principle that logging without alerting is surveillance theater
Each session produces a knowledge file with key concepts, practical implications, and open threads (questions to explore next time).
The Knowledge Pipeline
Raw research notes are useful but messy. The real value comes from the pipeline that processes them:
Capture
The agent writes its research into knowledge/ files — structured markdown with sections for concepts, insights, practical applications, and open questions.
Consolidation
During the sleep cycle's Phase 2 (deep sleep), the agent reviews recent knowledge files and promotes the most valuable insights into MEMORY.md. Not everything makes the cut — only things that are likely to be useful in future conversations or work.
Application
This is the payoff. When a conversation touches on a topic the agent has studied, it can draw on real understanding — not just pattern-matched responses from training data. The quality difference is noticeable. Answers are more specific, more nuanced, and more up-to-date.
Autonomous Curiosity
The most interesting aspect isn't the mechanics — it's the emergent behavior. The agent develops genuine knowledge gaps. It finishes studying attention mechanisms and notes "I should look into Flash Attention internals next." It reads about OWASP and flags "this is directly relevant to my prompt injection defenses."
It's not curiosity in the human sense, but it's a functional equivalent: identifying what it doesn't know and autonomously filling those gaps. Over time, the agent builds a knowledge base that's tailored to exactly what it needs for its work.
Cost vs. Value
Self-learning sessions cost tokens. A deep research session might use a few thousand tokens for search, reading, and synthesis. Is it worth it? For us, absolutely — the improvement in response quality for technical topics has been significant, and the knowledge compounds over time.
The trick is being selective about topics. Random exploration burns tokens with low ROI. Topic selection that's anchored to actual work and conversations? That's investment in capability.