Learn
/ The word list was never the product
Insight · Constraints
The word list was never the product.
We set out to build a spelling app. What we actually built was an argument about how people learn, and the spelling was just the place we happened to test it. The durable thing was never the app. It was the understanding we found by building it, and one good decision inside it that just reached its expiry date.
The asset was the understanding, not the artifact.
For a long time I believed the product was a spelling app. It had a database of words, a gentle way of correcting mistakes, a schedule that brought old words back at the right moment, and a small screen a child could sit in front of for 5 minutes before school. It worked, in the modest sense that a child came away knowing a word they had not known the day before. And it was the wrong thing to have been proud of.
The part worth keeping was never in the database. What we had built, without quite admitting it, was a written-down theory of how a child learns to spell. Every rule in the product was a sentence in that theory. Track the first attempt, not whether they finished. A small daily dose, not a Monday pile of 18. A hint after the second wrong try, the answer after the third. None of those are interface decisions. They are claims about how a young mind takes in a pattern, and how easily it takes in the wrong one.
This is the same move the protocol itself came from. The duo protocol was never designed up front. It was read back out of a gym app's git history, the patterns that kept recurring, named in hindsight. This ran the same way. The app was the instrument we used to find the theory. I spent six months mistaking the instrument for the discovery.
The artifact is scaffolding. The understanding is the asset. A scaffold is a thing you are meant to be able to take down once the building stands on its own.
So this is an essay about telling those two apart. It is also about a change in the surrounding conditions that means the theory no longer has to live inside the app we found it in.
A boundary, first.
A theory of learning is only honest if you say where it stops applying, so let me draw the line before I lean on it.
Everything here is about alphabetic, letter-based languages, the kind where writing encodes sound. In an alphabetic system, spelling is the work of recovering a mapping from the sounds in a word to the letters that represent them, along patterns that mostly hold and occasionally betray you. Sound it out. Notice the cluster that always trips people up. Ask why the consonant doubles here. Those moves are possible because there is a sound-to-letter system underneath them, however cheerfully English breaks its own rules.
None of it carries over to writing systems where the written unit stands for a meaning rather than a sound. There, learning to write is a different task in kind, thousands of distinct visual forms held in memory, their internal structure, their history. You cannot sound your way to a character you have never seen. That is a real and serious subject, and it is not this one.
I draw the boundary for honesty. It turns out to matter for the rest of the argument. Because an alphabetic system is governed by rules, you can manufacture a fresh, valid piece of practice from those rules and from a particular child's particular weakness. A memorised glyph gives you nothing to generate from. The fence around the theory is the same fence that makes the last half of this essay possible.
The constraints are the curriculum.
Here is the thing it took me the whole build to be able to say plainly. The pedagogy is not a collection of teaching tricks. It is a short list of human limits, taken seriously, and then designed around.
Working memory is small. A child can hold only a handful of genuinely new things at once. A school sends home 18 words on a Monday because 18 is the tradition, not because anyone can carry 18. Most never make the journey from the front of the mind into anything durable, and they are gone by midweek. So the right response is not to make 18 words more entertaining. It is to do 5 a day, every day, and let the habit carry them. Daily practice is the product. The small dose is what taking the limit seriously looks like.
Memory fades, and it fades on a curve. A word met again just as it is about to slip away is strengthened far more efficiently than a word crammed the night before a test. Cramming feels productive and mostly is not. Spacing the practice out feels slower and mostly is not. So old words return on their own, brought back by the schedule rather than by a parent's nagging.
The deepest limit is the one that gives the whole project its spine. Practice consolidates whatever you practise, and that includes the error. Each time a child writes a spelling, right or wrong, the act lays down and strengthens a trace. Our own principles put it more plainly than I can. Wrong spellings hard-wire in the brain through repetition, and a hard-wired error is very hard to undo.
Anti-pattern
Rehearsing the error until it is fluent.
A child who writes a word wrong eight times out of ten has not practised the word. They have practised the mistake, and a rehearsed mistake turns up at the exact moment they reach for the right one. The fix is not more repetition. It is catching the error before the second wrong try becomes a third.
That single fact is the entire reason the correction loop exists. A hint after the second wrong try. After the third, the correct answer, shown with encouragement, before the error has a chance to set. From the outside it reads as good manners. It is closer to damage control carried out at the level of memory. The mission was always one sentence. Never let a learner practise the wrong pattern. Everything else is that sentence worked out in detail.
Designing with the limits, not only around them.
Respecting a limit is half the move. The better half is turning the limit into the mechanism.
The forgetting curve is not only a problem to mitigate. It is the timetable. The same decay that loses a word by Wednesday tells you exactly when bringing it back will do the most good. Done properly, spaced practice does not fight forgetting. It rides it.
The effort of recalling a spelling, producing it from nothing rather than reading it off a card, is itself what burns it in. So the interaction asks the child to generate the word, not to recognise it among options. A little difficulty is doing useful work, right up to the point where it tips into repeated failure and starts consolidating errors again. The correction policy is where we drew that line.
Even a young child's short attention turns from an obstacle into a design. 5 minutes is not a reluctant compromise squeezed out of a distractible kid. It sits comfortably with how a child's attention and overnight consolidation actually work. The dose is small because small is what fits.
This is also why the product refuses the usual furniture of children's apps, the streaks, the points, the badges, the leaderboards. Those optimise for engagement, and engagement is a different thing from learning, sometimes an opposed thing. A child can run a magnificent streak and carry a head full of half-learned words. We measure the first attempt and almost nothing else, because the first attempt is the one number that tracks the promise we actually made.
The decision that expired.
So we had a genuine theory, worked out in some detail. Then we poured it into a fixed mould, and for good reasons.
The shipped app kept its pedagogy in a database. A table of words, sentences written by hand, the tricky clusters marked in advance, the same content handed to every family that opened the door. What adaptivity it had lived entirely in the scheduling. The engine decided when to bring a word back. It never decided what to teach next, and it never wrote a new piece of practice for the specific way a specific child fails. The kid who swaps two letters in a common cluster and the kid who keeps dropping a doubled consonant got the same list and the same hints, because there was only ever one list.
And the one place we most wanted real intelligence, the question of whether this is working and what to do next, we deliberately pushed outside the product. We wrote it down as a principle. Keep the cleverness out of the app, hand the parent their child's data, and let them take it to whatever capable assistant they already had. We even wrote down why. No ongoing costs. A model consulted on every word, for every child, every day, was a meter that would run forever, and we could not justify it. The same documents that named the pedagogy also listed a built-in tutor among the things we would not build, for two reasons. It was expensive, and left to itself it was pedagogically dubious.
Given the conditions at the time, that was the right call, and I would make it again in those conditions. But it is worth naming what it cost. The pedagogy was real, and it was frozen. We knew in principle how to teach the individual child in front of us. We could only afford to ship everyone the average of all of them.
Good decisions carry an expiry date, written into the circumstances that produced them. When the circumstances move, the honest thing is to read the decision again, not to defend the old answer out of loyalty.
The reactor.
Two of those conditions have now moved at once, and in the same direction.
Models have become good enough to generate practice that is actually fit for purpose. A sentence that uses a target word naturally rather than awkwardly. A hint pitched at a 7-year-old instead of at an adult. A sensible choice of what to put in front of the child next, given everything that came before. And the shape of the cost has changed. The kind of moment-to-moment generation that used to be billed by the call is increasingly folded into a subscription people already hold. The meter that made a model in the loop of every word indefensible has quietly been absorbed into a flat rate.
Put those together and the model can move from the role we gave it, the analyst you consult after the fact, to a new one. The thing in the loop, generating as the child works. The export becomes a conversation. Instead of retrieving the next fixed word from a table written months ago, the app can compose the right next piece of practice for this learner. Aimed at the mistake they actually keep making, dressed in a sentence about something they actually care about, set at the difficulty their last several attempts imply. The fixed list stops being the foundation and becomes one option among the many the system could now produce.
This is the protocol's oldest habit arriving in a new place. Generated beats maintained. If a thing can be rebuilt from a source on demand, rebuild it rather than freezing a copy and maintaining it by hand. The word list was the frozen copy. The pedagogy plus a capable model is the source. The same 5 words for everyone becomes the right next thing for this child.
The fence, not the filling.
There is a seductive misreading of all this, and it fails in exactly the way the pedagogy was built to prevent. The misreading is that models are good now, so point one at a child and let it teach spelling.
A model with no constraint optimises for what looks like success in the moment. Encouragement, momentum, the warm feeling that progress is being made. That is precisely the engagement theatre the pedagogy was built to refuse. Left to its own devices it will happily let a child practise a wrong pattern, because the exchange feels like it is going well. It does not feel the working-memory limit pressing, so it will cheerfully overload. It holds no opinion about the first attempt versus mere completion, because to a fluent generator everything completes.
The theory of learning is what you wrap around that capability so it generates within bounds. Never let a learner practise the wrong pattern stops being a slogan on a wall and becomes a hard edge on what the model may do. It can invent as freely as it likes, but only inside the fence the pedagogy draws. 5 things, not 18. Correct before quick. Surface a word as it fades, not before. Step in before a second wrong attempt becomes a third.
This is the protocol's instinct, stated as plainly as it ever gets. You build the fence at the input rather than relying on a net to catch the mess at the output. The pedagogy is the fence. And the arrival of something powerful enough to genuinely need fencing is exactly what makes the fence worth building well.
The through-line
The asset was never the word list, and it was never even the app. The app was the scaffold we climbed to see the pedagogy clearly. None of this is a new mission. It is the same one we started with. Never let a learner practise the wrong pattern. For six months we could only keep that promise on average, across one fixed list, for everyone at once. What changed is that we can finally keep it one child at a time. The lesson was never the words. It was what the words made us understand about the mind that was learning them.
Where to Go Next
Keep going with the protocol.
Read next
Extracted, not designed →
The closest sibling to this one. The protocol itself was never designed up front. It was read back out of a gym app's git history, the same way the pedagogy was read out of this one.
Insight · v1.3
Why v1.3 is a minor bump →
Where discipline lives now, and why moving it into structure beats relying on ceremony to enforce it.
Position · Claude Code
Why we double down on Claude Code →
Capable models need somewhere to put the structure. The protocol is what goes inside the primitives.
protocol
duo · construct →
AI works WITH you. The construction workflow the same instinct came out of.
read first
AI Basics 101 →
The memory myth, files as ground truth, why context lives in git.