Skip to content
The Quiet Conflict at the Heart of Translation Technology
By Volkan Güvenç, Founder, Alafranga Language Solutions



There is a tension running through the localization industry that almost nobody names out loud. It sits underneath every CAT tool demo, every "AI-powered" platform launch, every webinar about the future of translation. And it comes down to a single architectural fact: the way our tools are built and the way large language models actually work are pulling in opposite directions.


I have spent more than two decades inside this industry, first as a Turkish translation office in Istanbul in 2002, now running translation programmes across forty-plus languages from London. I have watched translation memory go from a competitive advantage to an industry assumption. And I think we are now at the start of a shift that most of the sector is not ready to talk about honestly.

What the segment was built for
Every mainstream CAT tools such as Trados, MemoQ, Phrase, Smartcat is built on the same foundation: the segment. The tool takes a document and breaks it into units, usually sentences, and treats each one as a row to be translated, matched, and stored.


This structure was not arbitrary. It was the right design for the technology of its time. Translation memory needs discrete units to match against. A sentence is a clean, reusable building block. Reviewers can work segment by segment. Project managers can measure progress in segments. Two translators can split a file along segment lines. The entire commercial and operational logic of modern translation production grew up around this single unit.

For thirty years, this worked.

What the segment breaks
Then large language models arrived, and they do not think in sentences. They think in context.

An LLM produces its best work when it can see the paragraph, the section, the surrounding discourse when it understands that a warning label three pages earlier sets the tone for a safety instruction here, that a pronoun refers to a device named two sentences ago, that a term chosen at the start of a manual must hold to the end. Isolate a sentence and hand it to an LLM alone, and you have switched off the very faculty that makes it valuable.


This is the conflict. The segment, which made translation memory possible, is now the thing standing between us and what these models can do. We built our entire infrastructure around a unit that the new technology would rather dissolve.

It is the same resistance the car industry met moving from the combustion engine to the electric motor. The existing system works, the investment is enormous, and the people who built their careers on it have no reason to want it replaced. Translation memory is not just a database but a thirty-year sunk cost, both technical and psychological. The industry does not want to give it up. And it should not have to. The real question is not whether to abandon translation memory, but how to feed it to a context-hungry model without breaking either one.

How the bridge tools are solving it and where they differ
An LLM produces its best work when it can see the paragraph, the section, the surrounding discourse when it understands that a warning label three pages earlier sets the tone for a safety instruction here, that a pronoun refers to a device named two sentences ago, that a term chosen at the start of a manual must hold to the end. Isolate a sentence and hand it to an LLM alone, and you have switched off the very faculty that makes it valuable.


Right now we are in a transition phase, and three tools illustrate three different bets on how to cross it. None of them has fully solved the problem. All of them are compromises. That is not a criticism — a bridge is supposed to be a compromise between two shores.

Phrase Next GenMT bet on retrieval. Instead of translating one segment at a time, it groups segments into text blocks and translates them together, so the model sees context within the block. Then it uses RAG — retrieval-augmented generation to pull the most relevant translation memory matches and glossary terms into the prompt at the moment of translation, as live examples for the model to follow. The crucial point: the model itself never changes. It stays fixed, and the relevant knowledge is fetched and injected each time. Nothing is forgotten, because nothing is overwritten.

LILT made the opposite bet. Instead of injecting knowledge into a fixed model, it changes the model itself. Every time a translator confirms or corrects a segment, that correction is fed back as a training signal and the model's parameters are updated on the spot — a small, fast piece of learning rather than a retrieval. As the translator moves through a document, the system genuinely adapts to it. This is closer to a model that "learns as it works." But it carries a cost that retrieval does not: a model that updates its own weights can also drift. Neural networks trained this way are vulnerable to what researchers call catastrophic forgetting, learning the new while quietly losing the old.

Smartcat went toward agents and accessibility, LLM-driven workflows that update memories and glossaries automatically, easy to adopt across a whole organization, with the model improving as the team edits. Powerful, but the more you push it toward genuine customization, the more the cost climbs.

These distinctions matter more than the marketing suggests. When a vendor says its system "learns," it is worth asking how. Retrieval and retraining are not the same thing, and they fail in different ways. One risks bloated prompts and rising token costs; the other risks forgetting. Anyone choosing a tool for serious, long-term, high-consistency work should understand which trade-off they are buying.

The cost wall everyone hits
Here is the practical ceiling we keep running into. The way to make any of these systems better is to give the model more, more glossary terms, more style instructions, more example matches, more context. And every one of those additions is more tokens in the prompt. Push hard enough on quality and the cost and the latency climb with it.


So the industry is caught between two pressures: we need higher capacity (longer context, richer retrieval, deeper customization) and we need it at lower cost. Right now those two pull against each other. The system that resolves them is the one that wins.

A note on very large documents for today's capacity: If a document exceeds today's context windows, the system can fall back to semantic chunking identifying natural topic boundaries (safety sections, installation chapters, etc.) and translating each chunk with full internal context. This is a bridge, not the final architecture. But it is far superior to sentence-by-segment translation, and as context windows grow and cheapen, even this compromise will disappear.

What the revolutionary architecture actually has to do
You load your files, your translation memory, and your glossary. From your instructions, the system drafts a translation prompt for the job: the tone, the register, the terminology rules. You review it and approve it. Nothing is translated until you have agreed on how it will be translated.


Let me describe the system I think we are heading toward and why I think the industry is overcomplicating it.

Then the model reads the entire document, start to finish, and translates it in one pass with the full document in view. This is the part that matters, and it is the part everything else has been working around. When a model can see the whole document at once, it makes the right call on a pronoun three pages later, holds a term consistent from the first page to the last, and carries the tone of a safety warning through every related instruction. This is simply the highest-quality output a machine can produce, because nothing has been hidden from it.

Only after that does the translation get broken back into segments and placed in front of the post-editor, because the segment is genuinely useful for review, navigation, and quality control. At that stage, the assistive technologies belong: terminology checks, consistency QA, the tools that help a human verify rather than translate.

Here is the part I want to be blunt about. Once a system reads the whole document up front, you do not need "real-time adaptive learning," "dynamic on-the-fly adaptation," or any of the other terms the industry is currently selling. Those features are not breakthroughs — they are workarounds. A tool learns segment by segment as the translator corrects it precisely because it could not see the whole document at the start. Solve the context problem at the source, and the need for that adaptation simply disappears. You are not teaching the model to compensate for what it missed; you made sure it missed nothing.

There is one real limit: a document long enough to exceed the model's context window. But context windows are growing fast and getting cheaper, and that limit is closing, not fixed. It is a temporary constraint, not an architectural flaw.

The pieces already exist separately. What is missing is the discipline to put them in the right order, read everything first, translate with full context, segment afterward for review — and the confidence to stop dressing up a limitation as a feature. 

When the editor changes one Word
Only after that does the translation get broken back into segments and placed in front of the post-editor, because the segment is genuinely useful for review, navigation, and quality control.


And here the full-document context pays off a second time. When a post-editor changes a term in one segment, the system already knows everywhere else that term appears, because it read the whole document at the start. So it can ask the obvious question a segment-blind tool cannot: "You have changed 'device' to 'unit' here,

 it appears in fourteen other segments. Apply everywhere, or just this one?" It can distinguish the places where the same word carries the same meaning from the places it does not. It can flag the downstream agreement and inflection changes a single edit triggers, which matters especially in suffix-heavy languages like Turkish.

This is not "adaptive learning." The model is not learning anything during post-editing and it already understood the document before the first segment was touched. It is simply being asked to check the ripple of a human decision against context it has held the whole time. That is the difference between a system that compensates for missing context and one that never lost it.

Sooner than we think
Here is the practical ceiling we keep running into. The way to make any of these systems better is to give the model more, more glossary terms, more style instructions, more example matches, more context. And every one of those additions is more tokens in the prompt. Push hard enough on quality and the cost and the latency climb with it.


I have learned to distrust my own sense of how long these things take, because lately the answer is always "less time than I expected." The foundations are being laid right now: longer context windows, cheaper and more efficient models, better retrieval, real competitive pressure on cost. Once those align, timelines tend to compress fast. The move from combustion to electric did not happen overnight either, but once the momentum built, it was sudden.

The segment paradigm will not survive in its current form. The tools that bridge to whatever comes next:  Phrase, LILT, Smartcat, and the others working this seam are doing genuinely useful work, and we use them. But they are the bridge, not the far shore.

The question is no longer whether the shift happens. It is how quickly, and which of us will have built the habits and the judgment to use the new architecture well when it lands. After twenty-three years of watching this industry change, my instinct says: prepare now.