Context Engineering
Without custom model training, the quality of an LLM's output is determined entirely by the quality of its inputs. We can take several key steps to ensure high-quality context and input:
| Aspect | Prompt | Context |
|---|---|---|
| Definition | The Instruction / Task | The Background / Knowledge |
| Role | Telling the model what to do | Telling the model what it knows |
| Persistence | Dynamic and query-specific | Static or long-term information |
| Analogy | The Exam Question | The Textbook |
| Form | Conversational | Programmatic |
1. Context Window
We should consider 5 subtle points to optimize the context window:
- Correctness: Provide accurate, verified info to prevent errors.
- Completeness: Include every essential detail to avoid missing context.
- Size: Remove irrelevant data to minimize token usage.
- What not to do: Define exactly what changes should be avoided.
- Trajectory: List the sequence of steps taken to clarify the current path.
2. Hierarchy of Failure
Context Rot: Model output quality degrades as the context window fills up with increasing token counts.
Note: A localized coding bug only affects a single line of code. However, a flawed plan can propagate into hundreds of incorrect lines. Worst of all, poor research—such as failing to understand how the system actually works—can corrupt thousands of lines.
- One bad LOC (Line of Code) == one bad LOC
- One bad LOR (Line of Research) = 100 wrong LOP
- One bad LOP (Line of Plan) == 100 wrong LOS
- One bad LOS (Line of Spec) = 1K wrong LOC
Flow: (1) Research → (2) Plan → (3) Spec → (4) Code
3. One Big AGENTS.md / CLAUDE.md Fails Because:
- Context is a scarce resource: Monolithic files consume valuable tokens, leaving less room for active codebase files and conversation history while increasing latency and costs.
- Too much guidance becomes non-guidance: Overloading the model with rules dilutes attention, causing it to ignore, drop, or get confused by competing instructions.
- It rots instantly: Since codebases and requirements change rapidly, large files quickly become outdated ("context rot"), causing the model to follow stale guidelines.
- It is hard to verify: Giant rule lists are difficult for developers to audit and troubleshoot when the model fails to follow instructions.
Solution: A short AGENTS.md (roughly 100 lines) is injected into context and serves primarily as a map, with pointers to deeper sources of truth elsewhere.
4. Pause and Resume a Session
Manage agent sessions actively to keep context windows optimal:
- Fresh Sessions: When a session gets bloated or stuck, save status to a progress file (e.g.,
progress.md), terminate it, and start a new session reading that summary. - Context Compaction: Periodically compress history (removing verbose terminal logs and file contents) manually or via built-in commands.
- Compaction Prompt:
Update progress.md with our current status. State the final goal, planned approach, completed steps, and current failure.
When Context Fails
Basically, this means: Garbage in → Garbage out.
1. Context Poisoning
- Occurs when an LLM's own mistake or hallucination is fed back into the context, causing it to repeat the error continuously.
- A typical example is when the model generates buggy code, then references its own flawed output as the source of truth in later steps.
2. Context Distraction
- Context distraction occurs when the context-size becomes so long that the AI pays too much attention to it and ignores its ongoing or past training.
- Instead of using its built-in or trained knowledge to solve problems, the AI just gets stuck copying things from the long context window.
- Avoid excessively long contexts unless the task is strictly summarization or simple information retrieval.
- Although modern models support 1M+ tokens, it is best to keep active context under 200K tokens.
- Read more: Long-Context RAG Performance in LLMs
3. Context Confusion
- Occurs when irrelevant or unnecessary information is included in the context, leading to degraded response quality.
- Because models try to process everything they are given, irrelevant data or unused tools dilute their focus.
- The Berkeley Function-Calling Leaderboard demonstrates how model performance drops as the number of available tools increases.
4. Context Clash
- Happens when newly introduced instructions, tools, or data contradict existing information in the history.
- The extra information isn't just useless; it directly conflicts with other instructions, causing the AI to get confused.
How to Fix Context Issues
Understanding failure modes is only half the battle. When context gets cluttered, poisoned, or conflicted, active mitigation is required. For a full breakdown of strategies, configurations, and REPL commands to keep your context optimized, read our companion guide: