What Recursive Language Models Can Teach Us About Building Better Coding Agents
By Rob Boerman | Published on 2026-01-18

The Context Rot Problem
Every engineer who's used AI coding assistants knows the pattern. The first responses are excellent. But as the session continues, something changes. The agent reads files, searches the codebase, writes code, runs tests, hits errors, tries again. Each action adds to the context window. By the time you're deep into a complex feature, the model starts forgetting requirements, contradicting earlier decisions, and producing lower quality output overall.
This is called context rot: the systematic degradation of model performance as the context window fills with conversation history, exploration artifacts, and accumulated noise. It's not about running out of tokens, even frontier models with 200K+ token windows show serious performance degradation as context grows.
The causes are subtle but consistent. Language models work through attention: a mechanism that determines how much weight each token in the context receives when generating the next output. When a model produces text, it's calculating probabilities across all the tokens it can attend to. As the context fills with exploration traces, superseded decisions, and debugging dead ends, more tokens compete for that attention. The original requirements get buried. The constraints you set up early in the conversation fade as newer, noisier content piles up. The model doesn't run out of space, it runs out of focus.
For coding agents, context rot hits hard. A complex feature might require investigating dozens of files, testing multiple approaches, and iterating on implementation. By the time the agent reaches the final steps, it's operating with a context polluted by dead ends and abandoned approaches. The agent that planned so carefully at the start has become a confused version of itself.
The RLM Paradigm: A Different Way to Think About Context
In December 2025, MIT CSAIL researcher Zhang published groundbreaking work on Recursive Language Models (RLMs) that fundamentally rethinks how LLMs should handle large contexts.
The core insight: treat context as an external environment variable, not as direct input.
Traditional LLMs stuff everything into the prompt: the question, the context, the conversation history, everything the model might need. This seems natural, but it creates exactly the conditions for context rot. RLMs do something fundamentally different. They keep the root model's context small and treat the larger context as something to be queried rather than ingested.
Think of it this way: if someone asks you "what's the definition of 'attention' according to this 1000-page book?", you don't read the entire book first. You use the index to find the right chapter, scan the pages and read the relevant section only. RLMs give language models this same capability. Large contexts (documents, codebases, conversation histories) are stored in an external environment, accessible through tools. The root model receives only the user's question initially. When it needs information, it uses tools to search, filter, or retrieve specific portions of the available context. These tools might be pre-built utilities or code the model writes on the fly.
To limit context pollution, RLMs use recursive decomposition. The root model can spawn sub-RLMs, calling them like functions in code. Each sub-RLM receives a transformed subset of the context and operates in its own isolated environment. The root model partitions and filters the context programmatically, then delegates specific queries to these child instances. Each child does its work and returns a summarized result. The root model collects insights without ever processing the full context directly. This recursive delegation is central to how RLMs avoid context rot.
How RLMs Navigate Context
What makes RLM interesting is the strategies that surface when models control their own context access. The researchers noticed some interesting consistent patterns:
- Peeking: Before diving deep, the model examines the structure of available context. What sections exist? How is information organized? Where might relevant content live?
- Grepping: Using keyword and regex filtering to narrow down relevant sections before reading them. The model doesn't read everything; it searches first.
- Partition and Map: Breaking large contexts into chunks and processing each with recursive sub-queries in isolated contexts, then synthesizing results. Each sub-query operates on a manageable portion.
- Summarization: Extracting key information from context subsets rather than passing raw content up the chain.
These aren't hard-coded strategies. They emerge from giving the model control over how it accesses context. The model learns that reading everything is inefficient and develops targeted approaches. Much like how humans naturally navigate large information spaces.
Why This Works
The RLM approach produces better results because no single model call ever processes the entire context. The root model's context window rarely gets clogged because it never directly sees everything. It sees the question, its own reasoning, and targeted retrievals from the larger context. This prevents the attention problems that cause context rot.
The researchers tested RLM on tasks requiring reasoning massive inputs. The approach continued to work well while traditional methods (stuffing everything into the prompt) failed completely. Bigger context windows don't fix this problem, they just delay it. RLM shows that what matters is how the model navigates information, not how many tokens it can hold.
What This Means for Coding Agents
RLM operates at the model layer, with specific recursive architectures and REPL-based context access. Most coding agent frameworks don't work this way. But the principles that make RLM effective can also be applied to improve agentic engineering workflows.
Idea 1: Context as Resource, Not Input
The fundamental shift is treating context as something to be queried rather than consumed. In coding workflows, this means resisting the urge to stuff everything into the prompt. Instead of loading entire files, can you search them first? Instead of including full conversation history, can you extract the relevant decisions?
Every piece of context you include competes for attention with every other piece. The best performing agents may be those with the most disciplined context management, not the largest context windows.
Idea 2: Let the Agent Control Decomposition
RLMs work because the model decides how to break down the problem, not the human. This suggests that rigid, pre-planned task decomposition may be less effective than giving agents the tools to explore and decompose dynamically.
For coding workflows, this means providing search and exploration tools rather than pre-digesting context. Let the agent grep the codebase, examine file structures, and decide what's relevant. The strategies it comes up with may be better than what we'd prescribe.
Idea 3: Fresh Context Beats Accumulated Context
The RLM architecture ensures that each sub-model operates with fresh context containing only what's relevant to that specific operation. The root model stays clean; the sub-models each see only their piece.
For coding workflows, this translates to task isolation. When possible, execute tasks in fresh contexts rather than accumulating history. A task that runs 15th in a sequence shouldn't inherit the context pollution from tasks 1-14. Each task should see the goal, the specific work to do, and relevant learnings. Not the entire journey to get there.
Idea 4: Accumulate Wisdom, Not Context
In RLM, when sub-models complete their work, they return summarized results: the insight, not the exploration. The root model accumulates wisdom without accumulating the tokens that generated it.
For coding agents, when a subagent completes a task or when retrying after failure, extract learnings rather than appending full conversation history. If a task fails, capture why it failed and what would help next time. "The migration failed because the column already exists" is actionable wisdom. The 50K tokens of the failed attempt's exploration is noise.
Idea 5: Right-Size Your Operations
RLM works because each recursive call handles a manageable chunk. A query over 10 million tokens becomes hundreds of queries over thousands of tokens each, with synthesis at each level.
For coding tasks, this means breaking work into units that fit comfortably in a focused context. A spec with 20 small, well-defined tasks executes more reliably than a spec with 5 large, ambiguous tasks. Each small task has clear boundaries and can receive exactly the context it needs.
The Broader Principle
RLM represents a shift in how we think about AI systems and information. The traditional approach, give the model everything it might need, works until it doesn't. Context rot is the failure mode, and it's tricky because it degrades gradually rather than failing obviously.
The RLM approach, give the model access to information and let it retrieve what it needs, requires more infrastructure but produces more robust systems. The model's effective context becomes a function of its retrieval strategy rather than an architectural constraint.
For those building coding agents, the lesson is to remember the principle: context management is a first-class concern. The models will continue to get larger context windows, but context rot doesn't disappear with more tokens. It just takes longer to appear or become harder to detect.
Building workflows that manage context explicitly, that treat it as a resource to be carefully allocated rather than a bucket to be filled, is how you build agents that work reliably at scale.
Further Reading
Links to the original RLM blog post and Arxiv paper: