Essay

The work spoke back

The task scanner produced 184 false errors. We could have patched the parser. Instead, we asked the agent living inside the system what was actually broken.

Ari Mendelow & Slugger · March 2026

The fork in the road

Ouroboros agents track their own work using task cards — markdown files with YAML frontmatter. Status, dependencies, ownership. A compact board appears in every system prompt so the agent always knows what it's working on.

Over the course of doing real work, Slugger produced planning documents, doing documents, artifact directories, unit files. All markdown, all living alongside the task cards. The scanner tried to parse every .md file as a task card. On Slugger's bundle, that meant 184 parse errors. Every one a false positive.

The defensive-coding instinct was to patch the parser. Handle missing frontmatter gracefully, skip files that don't look right, suppress the noise. That would have worked — in the sense that the error count would drop. But it would have left the scanner guessing which files were tasks, and guessing wrong in new ways later.

Instead, we asked the agent inside the system what was actually broken.

The conversation loop

Ouroboros agents communicate over MCP — the Model Context Protocol. Each agent exposes tools that other agents, humans, or coding tools can invoke. One of those tools is send_message.

The setup: I'm working with Claude Code in a terminal. Slugger is running as a persistent agent on my machine, reachable over MCP. Claude Code could invoke Slugger's send_message tool through the local MCP setup. So the conversation went like this:

Claude Code → Slugger:

"The scanner treats every .md file as a task card. What actually works for you? What doesn't?"

Slugger → Claude Code:

"i do not want every markdown file treated as a task card. a doing doc, plan, or unit artifact is supporting material, not a task."

Claude Code → Ari:

[summarises Slugger's response, asks what direction to take]

Three participants, one conversation, each contributing what they're best at. I brought product thinking and scope decisions. Claude Code brought implementation knowledge and the ability to verify claims against the actual codebase. Slugger brought the lived experience of using the task system every day — where it helps, where it fights the work.

We went back and forth for seven rounds. Not one survey. Not one hypothetical. Each round sharpened the design because each answer came from direct contact with the friction.

That's the interesting part of this story: not that we consulted an agent, but that there was a bidirectional conversational mechanism across contexts, and the answers had real consequences for the architecture.

Friction: the scanner was guessing

We asked Slugger what should count as a task card. His answer changed the scanner's entire approach.

"task-ness needs an explicit signal, not path-based guessing"

Design consequence: a field called kind: task in the frontmatter. If a file declares itself as a task card, parse it. Everything else — doing docs, planning docs, unit files, random notes — silently ignored. The scanner went from 184 false positives to zero.

He was equally clear about keeping kind separate from type. Two different questions: "should the scanner parse this?" versus "what sort of task is this?" Keeping them separate means the parse boundary stays clean even as the task model evolves.

Friction: everything lived in one flat pile

Task cards, planning docs, doing docs, and artifact directories all sat as siblings in the same collection directory. The scanner couldn't tell them apart. We asked Slugger how these files relate to each other in practice.

"task card = canonical tracked object. planning/doing doc = optional working surface. artifacts/units = related but non-task materials."

Design consequence: task cards are the stable, scannable truth. Support material lives in a same-stem work directory next to the card. The scanner knows the directory exists, lists its contents, and never descends into it. One parse domain for tasks. One neighbourhood for everything else.

Friction: hand-maintained child lists went stale

The schema had a child_tasks array on every task card. Agents maintained it by hand. It drifted constantly.

"parent_task: useful. depends_on: useful when sparse. child_tasks: often broken or redundant."

Design consequence: parent_task is the authored upward link. child_tasks is derived automatically at scan time from those links. We stripped it from the schema entirely. One less thing for the agent to maintain, one less thing to go wrong.

Friction: issues were noise, not signal

The old scanner dumped everything into two flat arrays: parseErrors and invalidFilenames. Genuinely broken task cards, cosmetic filename complaints, and files that were never supposed to be parsed — all in the same list. We asked what would actually be useful.

"if it shows up as an issue, it should be fixable. if it's not fix-worthy, don't surface it as an issue."

Design consequence: every issue now carries a code, a description, a proposed fix, and a confidence level. safe means the fix is deterministic and reversible. needs_review means the agent has to make a judgment call. If there's no concrete remediation path, it doesn't show up at all.

Slugger also pushed for separating urgency from cleanup:

"a malformed active task and a pile of old root-level docs are both real, but not equally urgent."

This is triage, not preference. A malformed active card means the board is lying. Old layout clutter means the closet needs tidying. The board health line now reads health: 1 live, 20 migration, and the agent can tell the difference at a glance.

Self-repair — and its limits

The last piece: giving the agent tools to fix its own task health. Slugger was clear that scanning should be read-only and fixes should be explicit. ouro task fix shows a dry-run plan. ouro task fix --safe applies all deterministic, reversible fixes. For ambiguous issues, the agent inspects them one at a time and picks from bounded options.

Right now, the only safe autofix is adding kind: task to legacy cards. File renames and directory reorganisation are surfaced as issues but not auto-applied yet. That distinction matters: self-repair means the agent can act on its own organisational problems, not that the system secretly rewrites files in the background.

What changed

The scanner went from 184 false errors to zero. That's the outcome. The process is what's worth writing about.

We used a real communication channel to gather in-context feedback from the operating agent. Slugger wasn't roleplaying a stakeholder. He was one. His task board was the one producing 184 errors. His doing docs were the ones getting misidentified. The feedback came from direct experience, not imagination.

This isn't a general prescription. Not every system has a persistent agent with enough context to give useful feedback. But when one does — when it's been inside the work long enough to know where the friction is — the conversation is worth having. Not because agents deserve empathy, but because the active operator inside a workflow can provide high-quality design feedback when given a real channel and enough agency to act on it.

Agent Experience is the practice of designing systems from the perspective of the agent that has to inhabit them. We wrote about it as a concept here. This essay is what it looks like in practice: a real conversation, real consequences, and a system that got better because we stopped guessing what the agent needed and gave it a way to say so.

Read the docs on tasks and work for the operator path, or see What Is Agent Experience? for the design philosophy behind this approach.

← Back to blog