← Back to Blog

Recursive Language Models (RLMs): A Brief Overview

AI LLMs Architecture

Recursive Language Models (RLMs) represent a shift in how AI systems handle massive contexts. Rather than attempting to fit millions of tokens into a single model's window—which often leads to performance degradation known as "Context Rot"—RLMs use a recursive inference strategy to decompose and interact with context programmatically.

The Core Problem: Context Rot

Traditional LLMs suffer from "Context Rot," where accuracy declines as the context window fills, even before reaching technical limits. This isn't just a capacity issue but a quality one; models become "dumber" or lose track of details as the conversation history or input data grows.

RLM Architecture Overview

Figure: An RLM interacts with a REPL environment to manage massive context, recursively sub-querying itself or other LMs to efficiently parse information. (Source: alexzhang13.github.io)

How RLMs Work

An RLM is a thin wrapper around a language model that allows it to interact with a computational environment to manage information.

1. Programmatic vs. Tokenized Context

RLMs maintain a distinction between two types of context:

2. The REPL Environment

The RLM operates within a REPL (Read-Eval-Print Loop) environment, typically Python. The long context is loaded as a variable in this environment. The "Root LLM" does not see the entire context; instead, it writes code to explore it.

3. Recursive Decomposition

The Root LLM can call other LLM instances (sub-calls) from within the REPL. This allows for a "divide and conquer" approach:

Key Benefits

Getting Started with RLMs

You can easily spin up a new environment and run RLMs in Python.

1. Setup your environment

# Create a new 3.14 environment
uv venv .venv

# Activate it
source .venv/bin/activate

# Install your package (lightning fast)
uv pip install rlms

2. Run your first RLM

from rlm import RLM

rlm = RLM(
    backend="gemini",
    backend_kwargs={"model_name": "gemini-2.0-flash"},
    verbose=True,  # For printing to console with rich, disabled by default.
)

print(rlm.completion("find the nearest city to stockholm and tell me how far is it in km and the travel time by car to get there.").response)

The Future: Agent Discovery

Beyond solving context limits, RLMs act as Agent Discovery Mechanisms. By observing the "traces" of how an RLM solves a complex problem—what strategies it tries, how it chunks data, and which sub-queries it makes—developers can identify repeating patterns. These patterns can then be "hard-coded" into optimized, low-latency agent architectures, effectively using RLMs to "invent" the best agent for a specific task.

Current Limitations

References