Article
Context Rot Is Two Different Failures
Context rot is not one disease. Long-context distraction and poisoned-context inertia need different product and engineering responses.
You have probably seen this pattern.
An agent starts on a bug. The first attempt is wrong. You say it is wrong. It tries again, but somehow the second attempt is still built on the first bad assumption. After a few rounds the whole session feels cursed.
Then you open a new chat, explain the same problem cleanly, and it fixes the bug in one pass.
People call this context rot.
I think that name hides two different failures.
Failure one: distraction
The first failure is easy to understand.
The context is too long. The signal is diluted. The model has to attend to too many tokens, so the important part gets less effective attention.
This is the long-context problem.
It is why a giant window does not automatically mean a giant brain. A model can technically accept a lot of tokens and still become worse at using them. The task is not only storage. It is retrieval, prioritization, and reasoning over the right subset.
For this kind of failure, the usual tools make sense:
- summarize;
- prune;
- retrieve only relevant material;
- move noisy work into sub-agents;
- keep instructions and tool definitions stable;
- compress old turns when they are no longer needed.
This is where context engineering is already useful. You reduce noise and keep the working set high-signal.
But this is not the whole story.
Failure two: poisoning
The second failure is more annoying.
The context is not necessarily huge. It may only be a normal coding session. But an early wrong assumption enters the thread and becomes part of the world.
The model tries to satisfy it.
Then your correction enters the thread too.
Now the model has the original bug, the failed fix, your rejection, new error messages, and maybe a second failed fix. These things are not cleanly separated into “wrong path” and “valid evidence.” They all sit in the same context as material the model has to reconcile.
This is poisoning.
The problem is not length. The problem is that the state is dirty.
That is why a compact summary can fail here. If the summary preserves the wrong assumption because it looks important, the poison survives compression. You made the context shorter, but not cleaner.
This matches my own experience with coding agents. When the session goes bad, asking the same agent to “think harder” often makes it worse. It tries to harmonize contradictory constraints instead of deleting the bad branch.
At that point, a new session is not superstition. It is state reset.
Product implications
If these are two different failures, the product should treat them differently.
For distraction, you need context budget controls.
Show what entered the context. Keep retrieval explainable. Let users inspect summaries. Use sub-agents to isolate noisy work. Attach evidence instead of dumping whole logs into the main thread.
For poisoning, you need rollback and branch control.
The user needs a way to say:
This direction is invalid. Do not preserve it as a premise.
That is different from:
Try again.
Most chat interfaces blur those two commands. “Try again” often keeps the old path alive. The model interprets the correction as a local adjustment, not a global invalidation.
Coding agents need stronger primitives:
- checkpoint before risky attempts;
- branch a hypothesis instead of mutating the main thread;
- mark an assumption as rejected;
- restart from clean evidence;
- preserve logs as external artifacts, not active beliefs;
- make the final summary say which paths were abandoned.
This is less glamorous than a bigger model window, but it is much closer to the real failure mode.
My current rule
When an agent gets worse over time, I ask:
Is it distracted, or is it poisoned?
If it is distracted, I reduce and isolate context.
If it is poisoned, I reset or roll back.
The mistake is treating both as the same problem. A shorter poisoned context is still poisoned. A clean but overloaded context is still overloaded.
Context rot is a useful name for the symptom.
It is not a useful diagnosis until we split it in two.