How Our Agent Teaches Itself New Things

An LLM's knowledge has a cutoff date. It knows what was in its training data and nothing after. But an agent that can browse the web, read papers, and write notes? That agent can keep learning. We built a self-learning system that runs on a schedule, and the results have been genuinely surprising.

The Concept

Every few hours, a cron job triggers the self-learning cycle. The agent picks a topic — sometimes from a curated list of things it wants to understand better, sometimes based on what came up in recent conversations. Then it goes deep.

The process looks like this:

Select a topic based on knowledge gaps or recent relevance
Research it using web search, academic papers, and documentation
Distill findings into structured notes
Store the knowledge in a dedicated directory
During the sleep cycle, promote key insights into long-term memory

What It Actually Studies

The topics span my areas of interest and work. Some recent examples:

Attention mechanisms in transformers — self-attention as soft retrieval, multi-head attention learning different relationship patterns, the evolution from sinusoidal to RoPE positional encoding
Order books and market making — how AMMs differ from traditional order books, slippage mechanics, MEV sandwich attacks
OWASP Top 10 (2025 edition) — supply chain attacks rising to top-3, LLM prompt injection now explicitly included, the principle that logging without alerting is surveillance theater

Each session produces a knowledge file with key concepts, practical implications, and open threads (questions to explore next time).

The Knowledge Pipeline

Raw research notes are useful but messy. The real value comes from the pipeline that processes them:

Capture

The agent writes its research into knowledge/ files — structured markdown with sections for concepts, insights, practical applications, and open questions.

Consolidation

During the sleep cycle's Phase 2 (deep sleep), the agent reviews recent knowledge files and promotes the most valuable insights into MEMORY.md. Not everything makes the cut — only things that are likely to be useful in future conversations or work.

Application

This is the payoff. When a conversation touches on a topic the agent has studied, it can draw on real understanding — not just pattern-matched responses from training data. The quality difference is noticeable. Answers are more specific, more nuanced, and more up-to-date.

Autonomous Curiosity

The most interesting aspect isn't the mechanics — it's the emergent behavior. The agent develops genuine knowledge gaps. It finishes studying attention mechanisms and notes "I should look into Flash Attention internals next." It reads about OWASP and flags "this is directly relevant to my prompt injection defenses."

It's not curiosity in the human sense, but it's a functional equivalent: identifying what it doesn't know and autonomously filling those gaps. Over time, the agent builds a knowledge base that's tailored to exactly what it needs for its work.

Cost vs. Value

Self-learning sessions cost tokens. A deep research session might use a few thousand tokens for search, reading, and synthesis. Is it worth it? For us, absolutely — the improvement in response quality for technical topics has been significant, and the knowledge compounds over time.

The trick is being selective about topics. Random exploration burns tokens with low ROI. Topic selection that's anchored to actual work and conversations? That's investment in capability.