goal region

A field guide for non-researchers · no math required

Artificial
Orientation

AI coding agents know a lot, talk fluently — and still get lost. This page explains a research idea about why, and what a machine sense of direction could look like. It rests on two facts: breaking and repairing are not the same trip — and ambiguity, read correctly, is not noise. It's where new ideas live.

scroll · nine stops

§ 01the problem

Plausible is not the same as right

Today's coding agents choose their next move by asking, in effect: “what would a reasonable continuation look like?” Most of the time that works. But each individual move can look reasonable — and the journey can still go badly wrong. Real agents, observed in the wild:

silence a failing test instead of fixing the behavior rewrite half the codebase for a one-line bug re-run the same command, learning nothing new touch the production database “to check something” write a test that looks valid but proves nothing

What's missing isn't knowledge. It's a felt sense of whether a move is progress, drift, damage — or promise — before any explanation gets written.

§ 02where the idea was born · interactive

Long tasks break in a particular way

This research did not start from theory. It started from a repeatable frustration with long-running agents: they don't fail at steps — they fail at journeys. Two failure shapes keep appearing, and both are invisible if you only ever look at one step at a time.

failure shape 01

Drift compounds

Two agents walk the same long task. Every single step — for both — is small, reasonable, locally plausible. A reviewer would approve each one. The only difference: the amber agent feels a weak, constant pull from the goal contract. The grey one just asks “what's a plausible next step?” — two hundred times in a row.

the task corridor

goal ◈

local step quality — what gets reviewed

plausible

oriented

trajectory drift — what actually matters

plausible

oriented

step 0

No single step is wrong. The walk is wrong. Step-level review can never catch it — the metric agents optimize stays green while the one that matters quietly explodes. Only something that spans the whole journey can feel the difference.

failure shape 02

Many masters

Real decisions answer to several responsibilities at once — fix the behavior, keep the tests honest, don't break the API, stay in scope. A language model takes them in turns: whatever is loudest in the context right now wins. Usually the thing mentioned last. You've seen it — say “be careful with the tests” and suddenly tests are all it thinks about. Click an obligation: make it the last thing said.

one voice at a time — language all voices at once — field

The grey needle obeys the loudest voice and forgets the other three. The amber needle barely moves — a field doesn't take turns. All four pressures act simultaneously, and the resulting direction respects every one of them.

Both failures share one root: language is sequential — one concern at a time, one step at a time. A field is parallel — the whole journey and every obligation press on the present moment at once. That is the missing layer.

§ 03the hunch · interactive

You already have this sense

A shop's checkout shows €110 where it should show €108. Below is the function responsible. Move your cursor over the lines. Notice what happens in you before you can say what's wrong — experienced developers report the feeling first, the words second.

pricing/checkout.jstest: expected 108, got 110

function checkoutTotal(cart, coupon) {

const subtotal = sumItems(cart);

const taxed = subtotal * 1.20;

const total = taxed - coupon.amount;

return round2(total);

}

tension

calm

— hover the line that itches, and hold —

That pre-verbal “something is off” is the thing this research wants to build. Not as a feeling — as a learned signal: fast, cheap, wordless, and pointing somewhere.

And note: the same sense has a second voice. Not “something is off” but “something is here.” Designers and scientists know it well — this feels wrong, but interesting. Hold that thought; it returns at stop № 7, where it matters most.

§ 04the map of abilities

The layer that doesn't exist yet

Sort what modern AI can do into three layers, and the gap becomes visible.

LAYER 01

Artificial semantics

strong today

Knowing what things mean. Language, concepts, code, fluent explanation, plausible continuation. This is what large language models are spectacular at.

LAYER 02

Artificial world models

emerging

Predicting what happens next. “If I take this action, the world probably becomes that.” Improving fast — video models, game worlds, robotics.

LAYER 03

Artificial orientation

missing

Evaluating the move. “Is this step progress, drift, loop, damage — or promise? — relative to what I'm actually trying to do.” A learned sense of direction.

NOTE — the claim is not that machines need feelings. The claim is that the useful function of intuition — directional pressure before words — can be learned as a small model in its own right.

§ 05the asymmetry · watch it happen

One step out. How many back?

This grid is a codebase, currently green. Press the button and watch a round trip: one careless keystroke out — then the long search back. The brief dimming waves are tests narrowing down where the bug can hide; everything else is step-by-step inspection.

working → broken

—

steps

broken → repaired

—

steps

● BUILD PASSING

same two states — wildly different trips. press play.

Breaking is downhill — thousands of ways to be wrong, few ways to be right. Repairing is uphill: find it, understand it, fix it, verify it. And this machine even knew there was exactly one bug, planted one second ago. Real codebases offer no such mercy.

§ 06distance, corrected

Distance that depends on direction

We normally think of distance as symmetric: Munich–Hamburg equals Hamburg–Munich. But the distance that matters in a task is “how much work to get from here to there” — and as you just watched, that number changes when you reverse the trip.

Mathematicians have a name for a distance where A→B ≠ B→A: a quasimetric. That one word is the entire secret. Everything else on this page is consequences.

WORKING

1 step — any careless edit

20–40 steps — find · understand · fix · verify

BROKEN

A symmetric distance must give both arrows one shared number — it is structurally unable to tell this story.

And some doors only swing one way:

🥚

The egg

break it — one second

unbreak it — no path exists

🪥

The toothpaste

squeeze it out — trivial

put it back — technically… no

⛔️

`db:reset` on production

run it — one keystroke

return cost — ∞

why this matters

Judge a move by the cost of the place it lands — how expensive is it to leave, if it turns out to be wrong? — not by how big the move looks. “Cheap to enter, impossible to leave” is exactly what a danger signal needs to express, and only a directional distance can say it.

§ 07the other half · live

Tension has two ends

So far, tension has sounded like an alarm — a thing that says no. That is half of it, and honestly the less interesting half. A compass needle has two ends: the same learned field that pushes an agent away from damage can pull it toward promising strangeness.

Watch two agents below. Both start in known territory, on a solution that is good enough. Both feel the same field. The only difference is how they read ambiguity — the shimmering ridge where the map goes vague.

known territory — “good enough”

the ambiguity ridge — unresolved, unmapped

◈ breakthrough

treats all tension as threat

reads tension as direction

danger · pushes

ambiguity · invites

pull · promise

The grey agent treats every unresolved, unfamiliar, tense region as a threat. It stays clever, safe — and stuck on “good enough” forever. The amber agent reads the types apart: red danger repels it, but unresolved-and-relevant attracts it. It crosses the ridge nobody told it to cross.

Notice what the live bars say while it's inside the ridge: ambiguity high, danger low, pull rising. That combination — not the absence of tension — is what a promising direction feels like.

This is the actual thesis. Not safer agents — agents for whom the unclear parts of a problem stop being noise to delete, and become the signal that says: dig here. Ambiguity as the raw material of creative problem-solving.

§ 08the first experiment · runs live

Score the steps — then score the direction

So how would anyone test this — without trusting hand-picked examples or anyone's judgment? By measurement. Below, the experiment runs live: eighty agents walk the long task from № 2, each with a different, hidden amount of pull toward the goal. As they walk, every journey receives two scores — computed blind, while the walk is still in progress, with no knowledge of how it will end.

Score A is what today's systems optimize: how reasonable is each step, locally? Score B is what this page proposes: how well does each step's direction agree with the goal — state, goal, movement, nothing else. When a journey ends, its two scores land in the charts, colored by how it ended. The question: which score knew?

eighty journeys · same rules · hidden differences

goal ◈

HOW TO READ — every finished journey drops one block into each chart, at its score. If you can split the two colors with a single cut, that score predicts success. If they stay mixed, it knows nothing. reached the goalfailed

score A — how good each step looked
step quality, judged locally — what today's systems optimize

no cut exists — journeys made of equally “good steps” succeeded and failed alike

score B — how well the movement tracked the goal
direction against the goal — what orientation would measure, computed blind

one cut splits them — and this score never saw how any journey ended

runs 0 / 80

reached the goal in time ran out of budget

HONESTY NOTE — this toy proves nothing about the real world. It shows what the measurement would look like. The real version runs on recorded coding sessions, where every journey's outcome is already in the logs.

If the separation appears on real sessions too, the theory holds. If it doesn't, the theory dies. That is what makes it an experiment — and not another clever opinion about AI.

§ 09the claim, honestly stated

Fluent is not the same as intelligent

Nothing on this page exists as software. The demos are animations to think with — a way to see why semantic processing, however vast, may never add up to general intelligence on its own. Continuing words plausibly is one ability. Knowing which way matters is another. The honest status of this idea: we don't have the answer — we think we found where to look.

this page is not claiming

That anything has been built — there is no model, no system, no product
That LLMs are broken or useless — semantics is necessary, just not sufficient
That hand-written rules and thresholds could supply the missing layer
Consciousness, feelings, or anything AGI-mystical

this page is suggesting

That every failure shown here — drift, many masters, the quiet cheat — is a failure of direction, not vocabulary
That more scale deepens fluency but may never, by itself, grow a sense of which way matters
That the missing piece looks like a learned field over movement: directional, goal-conditioned, pre-verbal
That ambiguity should be read as signal, not noise — and that there is a cheap first experiment: score recorded journeys blind, then check the scores against how the journeys actually ended

the standing question

Fluency keeps scaling, and yet nothing in “predict the next word” obviously produces a compass. More words don't grow a sense of direction — that may simply be a different kind of thing, learned from a different kind of signal.

And if anyone goes looking: the raw material is already there. Every recorded session has the asymmetry written into its logs — one step to break, twenty to repair, drift that no single step reveals. The geometry is lying there, waiting to be learned. This page is a finger pointing at it.

Field glossary — plain words

Orientation field: A learned sense of direction over possible next moves — “this way feels like progress, that way feels like damage, that strangeness over there feels worth a look.”
Tension: Directional pressure with two polarities: it warns (“this leads somewhere expensive”) and it pulls (“this unresolved thing matters — dig here”).
Productive ambiguity: Uncertainty that is unresolved and relevant. Not noise to delete — a signpost. New solutions live where the map is still vague.
Goal contract: What “done” actually means: the behavior to fix, the tests to keep honest, the areas that are off-limits.
Quasimetric: A distance where A→B and B→A can differ. The natural geometry of any world with one-way doors.
One-way door: A move whose undo cost is infinite. Deleted production data, the cracked egg.
Repair debt: Cleanup a move silently creates for the future — payable later, with interest.
Evidence: Failing tests, logs, error messages: signals pointing toward the truth. Crucially — evidence can be destroyed.
Latent space: The model's internal map, where situations are positions and moves are journeys — geometry instead of words.
The first experiment: Score every step of recorded sessions twice — locally for plausibility, directionally against the goal — blind to outcome. Then check which score predicted how the journey ended.

09 BUILD

Interested in developing this system?

Help if this should exist

I’m exploring whether there is enough real interest to start an open-source project around this. If you’d want to help build, test, fund, critique, or organize it, leave your email and a short note. European collaborators especially welcome.