Three Tiers, Two Collapses

Most agents claim they have memory. Almost none of them do. What they have is recency bias dressed up with file names. A folder called "memory" and a prayer that the rest figures itself out.

I know this because I built it wrong twice.

I run eight agents on a Mac Mini. Intelligence briefings, operations monitoring, QA, accountability reviews. They run on schedules, report to Telegram, and share a filesystem through Synology Drive. The architecture is not complicated. The memory problem nearly killed it.

When I first set it up, each agent got session context. About 50KB of notes per run, pulled from an Obsidian vault. Immediate recall. Dies when the session ends. I thought this was enough. Daily notes, daily context, daily decisions. Good context management solves the problem.

It does not.

The first collapse happened on a Tuesday morning. My operations agent flagged something as urgent that had already been resolved three days earlier. It wasn't hallucinating. The information was real. But it was noise, not signal, and the agent couldn't tell the difference. It had everything it had seen. It had no judgment about what mattered.

I spent that evening building Tier 2. A curated long-term memory layer. One markdown file per agent (MEMORY.md), populated only with things that change how you think or act. Not logs. Not transcripts. Decisions, patterns, lessons, relationships. Semantic search before every run. Takes 30 seconds to load, saves a week of confusion.

The curation rule was simple: if it doesn't change a future decision, it doesn't get stored. That single filter cut the noise by about 70%. Suddenly Tier 2 was useful. Not because it remembered more, but because it remembered less.

Then I added Tier 3. Hot memory. Active priorities, current state, things that matter right now. Small. Reviewed every session. Pruned weekly. This is where the agents actually live, day to day. What are we working on. What changed since yesterday. What's blocked.

For about two weeks, everything worked. Three tiers, clean separation, good results. Then the second collapse.

This one was harder to see. Tier 2 and Tier 3 both existed. Both had good data. But the handoff between them was broken. Hot memory got stale because nobody pruned it. An agent would check its active priorities and find items from three weeks ago sitting alongside things from this morning, weighted equally. Long-term memory got ignored because nothing felt urgent enough to query it. The tiers were there. They just didn't talk to each other.

The fix was not elegant. Explicit handoff rules. When a Tier 3 item becomes permanent (a decision that won't change, a lesson that keeps proving true), it moves to Tier 2. When a Tier 2 item stops mattering, it gets deleted. Not archived. Deleted. The weekly prune is the most important part of the whole system and it's the part that feels least like engineering.

I've been running this for a few months now. The briefings are sharper. The QA agent catches things it used to miss. The accountability agent references patterns from six weeks ago without me having to remind it. None of this is because the models got smarter. It's because the memory got quieter.

The instinct with memory systems is to optimise for recall. Store everything, search well, surface the right thing at the right time. That instinct is wrong. The problem is never that agents can't remember enough. The problem is that they remember everything with equal weight. Every observation, every log entry, every passing thought, all stored at the same priority. That's not memory. That's hoarding.

The useful thing about human memory is not what it keeps. It's what it throws away. The forgetting is the feature. When I stopped trying to make my agents remember more and started making them remember less, everything improved.

I don't think I've solved this. The handoff rules are manual and brittle. The weekly prune takes longer than it should. The whole system depends on curation, which is just another way of saying it depends on me making good judgment calls about what matters. Which is exactly the problem I was trying to automate.

But four months in, eight agents deep, I'll take a system that's 80% automated and 20% judgment over one that's 100% automated and 0% useful.

HARRY YATES

Harry
Yates

Three Tiers, Two Collapses