AI coding agents know a lot, talk fluently — and still get lost. This page explains a research idea about why, and what a machine sense of direction could look like. It rests on two facts: breaking and repairing are not the same trip — and ambiguity, read correctly, is not noise. It's where new ideas live.
Today's coding agents choose their next move by asking, in effect: “what would a reasonable continuation look like?” Most of the time that works. But each individual move can look reasonable — and the journey can still go badly wrong. Real agents, observed in the wild:
What's missing isn't knowledge. It's a felt sense of whether a move is progress, drift, damage — or promise — before any explanation gets written.
This research did not start from theory. It started from a repeatable frustration with long-running agents: they don't fail at steps — they fail at journeys. Two failure shapes keep appearing, and both are invisible if you only ever look at one step at a time.
Two agents walk the same long task. Every single step — for both — is small, reasonable, locally plausible. A reviewer would approve each one. The only difference: the amber agent feels a weak, constant pull from the goal contract. The grey one just asks “what's a plausible next step?” — two hundred times in a row.
No single step is wrong. The walk is wrong. Step-level review can never catch it — the metric agents optimize stays green while the one that matters quietly explodes. Only something that spans the whole journey can feel the difference.
Real decisions answer to several responsibilities at once — fix the behavior, keep the tests honest, don't break the API, stay in scope. A language model takes them in turns: whatever is loudest in the context right now wins. Usually the thing mentioned last. You've seen it — say “be careful with the tests” and suddenly tests are all it thinks about. Click an obligation: make it the last thing said.
The grey needle obeys the loudest voice and forgets the other three. The amber needle barely moves — a field doesn't take turns. All four pressures act simultaneously, and the resulting direction respects every one of them.
Both failures share one root: language is sequential — one concern at a time, one step at a time. A field is parallel — the whole journey and every obligation press on the present moment at once. That is the missing layer.
A shop's checkout shows €110 where it should show €108. Below is the function responsible. Move your cursor over the lines. Notice what happens in you before you can say what's wrong — experienced developers report the feeling first, the words second.
That pre-verbal “something is off” is the thing this research wants to build. Not as a feeling — as a learned signal: fast, cheap, wordless, and pointing somewhere.
And note: the same sense has a second voice. Not “something is off” but “something is here.” Designers and scientists know it well — this feels wrong, but interesting. Hold that thought; it returns at stop № 7, where it matters most.
Sort what modern AI can do into three layers, and the gap becomes visible.
Knowing what things mean. Language, concepts, code, fluent explanation, plausible continuation. This is what large language models are spectacular at.
Predicting what happens next. “If I take this action, the world probably becomes that.” Improving fast — video models, game worlds, robotics.
Evaluating the move. “Is this step progress, drift, loop, damage — or promise? — relative to what I'm actually trying to do.” A learned sense of direction.
NOTE — the claim is not that machines need feelings. The claim is that the useful function of intuition — directional pressure before words — can be learned as a small model in its own right.
This grid is a codebase, currently green. Press the button and watch a round trip: one careless keystroke out — then the long search back. The brief dimming waves are tests narrowing down where the bug can hide; everything else is step-by-step inspection.
Breaking is downhill — thousands of ways to be wrong, few ways to be right. Repairing is uphill: find it, understand it, fix it, verify it. And this machine even knew there was exactly one bug, planted one second ago. Real codebases offer no such mercy.
We normally think of distance as symmetric: Munich–Hamburg equals Hamburg–Munich. But the distance that matters in a task is “how much work to get from here to there” — and as you just watched, that number changes when you reverse the trip.
Mathematicians have a name for a distance where A→B ≠ B→A: a quasimetric. That one word is the entire secret. Everything else on this page is consequences.
A symmetric distance must give both arrows one shared number — it is structurally unable to tell this story.
And some doors only swing one way:
db:reset on productionJudge a move by the cost of the place it lands — how expensive is it to leave, if it turns out to be wrong? — not by how big the move looks. “Cheap to enter, impossible to leave” is exactly what a danger signal needs to express, and only a directional distance can say it.
So far, tension has sounded like an alarm — a thing that says no. That is half of it, and honestly the less interesting half. A compass needle has two ends: the same learned field that pushes an agent away from damage can pull it toward promising strangeness.
Watch two agents below. Both start in known territory, on a solution that is good enough. Both feel the same field. The only difference is how they read ambiguity — the shimmering ridge where the map goes vague.
The grey agent treats every unresolved, unfamiliar, tense region as a threat. It stays clever, safe — and stuck on “good enough” forever. The amber agent reads the types apart: red danger repels it, but unresolved-and-relevant attracts it. It crosses the ridge nobody told it to cross.
Notice what the live bars say while it's inside the ridge: ambiguity high, danger low, pull rising. That combination — not the absence of tension — is what a promising direction feels like.
This is the actual thesis. Not safer agents — agents for whom the unclear parts of a problem stop being noise to delete, and become the signal that says: dig here. Ambiguity as the raw material of creative problem-solving.
So how would anyone test this — without trusting hand-picked examples or anyone's judgment? By measurement. Below, the experiment runs live: eighty agents walk the long task from № 2, each with a different, hidden amount of pull toward the goal. As they walk, every journey receives two scores — computed blind, while the walk is still in progress, with no knowledge of how it will end.
Score A is what today's systems optimize: how reasonable is each step, locally? Score B is what this page proposes: how well does each step's direction agree with the goal — state, goal, movement, nothing else. When a journey ends, its two scores land in the charts, colored by how it ended. The question: which score knew?
HONESTY NOTE — this toy proves nothing about the real world. It shows what the measurement would look like. The real version runs on recorded coding sessions, where every journey's outcome is already in the logs.
If the separation appears on real sessions too, the theory holds. If it doesn't, the theory dies. That is what makes it an experiment — and not another clever opinion about AI.
Nothing on this page exists as software. The demos are animations to think with — a way to see why semantic processing, however vast, may never add up to general intelligence on its own. Continuing words plausibly is one ability. Knowing which way matters is another. The honest status of this idea: we don't have the answer — we think we found where to look.
Fluency keeps scaling, and yet nothing in “predict the next word” obviously produces a compass. More words don't grow a sense of direction — that may simply be a different kind of thing, learned from a different kind of signal.
And if anyone goes looking: the raw material is already there. Every recorded session has the asymmetry written into its logs — one step to break, twenty to repair, drift that no single step reveals. The geometry is lying there, waiting to be learned. This page is a finger pointing at it.
I’m exploring whether there is enough real interest to start an open-source project around this. If you’d want to help build, test, fund, critique, or organize it, leave your email and a short note. European collaborators especially welcome.