What are Statistical Machine Translation and Neural Machine Translation?

Have you ever asked yourself what a post-editor does? Is a post-editor a translator? What is MTPE?

Well, I will try to explain.

Machine Translation (MT) and Neural Machine Translation (NMT) are part of a broader Artificial Intelligence (AI) framework. MT is an attempt to process natural human language. However, given the complexity of language and communication, this area involves significant challenges.

From the 1990s onwards, MT systems were essentially STATISTICAL – SMT. It used large bilingual datasets to create statistical translation models. This data was used to train the machine and produce a result similar to that produced by humans.

Remember the post about the pen with a stroke, with a disastrous translation from English into Portuguese? Caneta com acidente vascular cerebral? The word STROKE was mistranslated: the MT chosen stroke with the meaning of a serious life-threatening medical condition – the result in Portuguese sounded weird and ridiculous. This kind of mistake is due to this statistical MT. Statistically, in a vast world of data, the word stroke appears more often with the meaning of disease than of skill of an object. So what decision does the machine make? The decision for the most common definition, even though it doesn’t make any sense in that context.

Then, companies started working on a new approach: NMT, MT based on NEURAL systems. Neural networks are programming structures inspired by the human brain. With technological developments and a large amount of data available, it has become easier and cheaper to train computing models.

In 2016, Google then announced the development of Google Neural Machine Translation (GNMT). That same year, they began using this system – which quickly became the first choice in the translation industry.

Thus, NMT uses complete sentences while the SMT uses small segments, independently translated, which does not always produce acceptable results. This problem occurs especially when the source and target languages have different syntactic structures. NMT allows the machine to model the meaning of words in their CONTEXT and use this data to produce a better output result.

However, if the source text has low quality, the translation result can also be quite imperfect.

Therefore, considering these issues – the output text may not accurately translate the content of the source text. That is why the POST-EDITORS back-testing is essential for producing a satisfactory translation result and helps the machine build an increasingly better result.

Alternatively, professional POST-EDITORS can do it in our industry, called MTPE – Machine Translation Post Edition.

So, that’s what we do: we work with machines to produce large amounts of content faster and lower cost.

Image credit: RWS Group