How Is Machine Translation Quality Assessed?

March 15, 2024
How Is Machine Translation Quality Assessed?

In the growing industry of machine translations, the need for quality assurance of translated content is in demand, as machines alone are not infallible.

Machine translation quality estimation involves the quality assessment of text that was previously translated by a machine. Native speakers examine the translated content and correct any discrepancies that are found. These human translators are highly skilled and will correct any errors to get a final translation that is error-free and accurate.

In addition to words, the linguists also check glossaries, correct use of voice, and overall tone of the target language. Let’s explore machine translation and look at the various ways machine translation quality impacts the industry.

What Is Machine Translation?

Machine translation has made appearances in the AI world as apps such as Google Translate, yet in a larger format, the machine translation industry is much larger and more complex. Machine translation involves translating text and content from one native language to another.

Machine translation is a sub-category of computational linguistics that borrows from computer science, artificial intelligence, information theory, and statistics. For a long time, it was looked down upon as a flawed science because of the low quality of its translations. Yet, in the last few decades, there has been incredible progress in machine translation quality, and its evolution has created a billion-dollar industry.

While machine translation alone is not perfect, the use of human translators along with machine learning has many practical applications. Before AI, manual translation agencies and other professional translators were the mainstays in the industry, and translations were slow and expensive due to the incredible amount of labor it took to perform a good translation.

Today, the machine translation market is thriving with the combination of human and machine translations, faster service, and lower costs.

Read more: A Very Brief History of Translation

Flaws With Machine Translation

Even with the burgeoning technology-driven markets, machine translation remains inferior to human linguists. Gender bias, distorted and misinterpreted words and phrases, and bizarre word phrasing is common, and this is especially true with real-time translation apps.

Perhaps the most profound flaw is a machine’s inability to process human thought and emotion. The pairing of human linguists rectifies this as human translators can find errors that machines miss.

Machine translation is ineffective in many language pairs, as morphologically rich languages are challenging to the AI brain, especially from a morphologically simple to a morphologically more complex language.

In this challenge, morphological distinctions absent in the source language must be generated in the target language. To solve this, language-rich tools must be utilized, and this is often absent in machine-only translation.

Many times, information is lacking about the target word order from the source sentence or phrases, so a human translator is used to find this important information.

Another frequent issue of inflectional languages is an inaccurate translation of pronouns, and in many cases, in inflectional languages, the subject is dropped completely (Slavic languages represent a category of highly inflected languages). Also, machine translation is problematic for differences in the expression of negation, and it causes many problems when paired with complex languages.

To solve the problem of inaccurate machine translation, human involvement bridges the language gaps, yet more than anything, machine translation quality evaluation is vital as the translation industry grows.

Machine Translation Quality Assessment

With the emerging global marketplace, the demand for quality translations is evident, and accuracy is becoming more and more important. The new blockchain markets, for example, deal with international clients, and financial documents must be translated into hundreds of different languages.

Judging machine translation quality is defined as machine translation evaluation, and we'll discuss the two common types, then take a look at post-editing.

Manual Evaluation

The most common way to access and measure machine translation quality is through human evaluation. In this method, the quality of machine translation output is examined by linguistic professionals from two different perspectives.

The first perspective involves fluency and looks at the degree of accuracy in the target text and target language averages, grammaticality, and clarity. The professionals only have access to the translation and not the source data, and fluency only requires an expert fluent in the target language.

Next, text accuracy is accessed through the source text averages and meaning, and how well the target text represents the content of the source text. When evaluating text accuracy, the context of sentences is considered. The professional must be bilingual in both the source text and target languages.

Human or manual machine translation quality evaluations are time-consuming and costly, but it is also subjective. To lessen the issue of subjectivity, more professionals evaluate the translations in the same evaluation set, and these are justified statistically.

Automatic Evaluation

Automatic evaluation is identified commonly by technical terms and defined as an algorithm. This algorithm is coded into a program and launched by a computer that calculates the evaluation score. This score informs the user how good the translation is and evaluates word count, sentence errors, and the overall context of the target language.

Automatic evaluation is not perfect, and the evaluations must be run many times to get accurate data. This technology is rapidly improving, and bugs are fixed as solutions are found.

Automatic evaluation metrics are cost-free possibilities for human evaluation and are used to improve machine translation systems. The automatic metrics are based on the concept that machine translation quality should get close to human translation. This concept is dependent on the availability of human resources, as metrics evaluate the output of machine translation systems by comparing it to a reference translation.

Because the variability is great even with human translation, it is vital to have human-based reference translations for every machine-translated sentence to be accessed. Evaluation metrics then evaluate assessment statistics based on the most similar reference translation.

Post-Editing Machine Translation As Quality Estimation

Machine translation combined with later post-editing is another popular method in the translation industry. It also created new careers for translators by introducing a viable solution for editing translations. Post-editing machine translation services is another way to guarantee accuracy in the content, yet it’s not without its detractors. 

Instead of only working with the source text, the translator has both the source and an error-ridden translation. The standard protocol is to use machine-translated text as a raw translation that is edited by a linguist. Post-editing tools and procedures are being implemented by the translation industry, and studies report that 30% of translation companies use machine translation, and 70% utilize post-editing at least part of the time.

Shown to be faster than human translation, post-editing is lucrative to an industry focused on accuracy. It’s also faster than translation from zero content or translation assisted by translation memory. Apart from speed, other studies demonstrated that cognitive, temporal, and technical efforts improved in professional translations. The temporal effort is defined as the time required to post-edit a text, and cognitive effort involves the firing of cognitive processes during post-editing. The technical aspect is the operations during post-editing such as deletions and insertions. Yet, all three of these areas are influenced by translation quality.

While the verdict is still out on whether post-editing improves translations or makes them worse, some interesting metrics have been developed to test the hypothesis.

Human Error Translation Error Rate (HTER)

Human-mediated translation error rate (HTER) is a human variant of translation error rate. HTER consider what edits are done to convert a translation into its post-edited version. It is calculated as the ratio between how many edit steps and the number of words in the post-edited version. HTER can also be used as a measure of technical post-editing effort. The fewer changes required to convert the translation into its post-edited version, the less effort necessary from the translator. HTER is focused on the final translation and not the protocol.

Actual Edit Rate (AER)

The metric called the Actual Edit Rate measures the translator’s edit operations, and this could involve complex procedures such as editing previously post-edited content. A recent study on post-editing found a connection between the HTER and machine translation qualities.

An increase in human-mediated translation error rate was found as the quality of the machine translation system decreased. In contrast, it found no relation between actual edit rate and machine translation quality. It also found that keyboard activity is not correlated to machine translation quality and post-editing time. There was a linear connection between machine translation quality and post-editing speed.

The study also showed a correlation between the quality of machine translation output and the quality after post-editing. It confirmed that a poor translation always results in a worse result after post-editing.


How does integrating machine translation assessment affect the workflow?

Integrating machine translation assessment efficiently involves a strategic blend of both automatic and manual evaluations. We've found that setting up initial automatic checks followed by targeted manual reviews for complex or critical content can significantly enhance both the speed and accuracy of translations, optimizing our workflow.

How can you improve the accuracy of machine translation?

To improve the accuracy of machine translations, we actively explore the latest in AI and linguistic technologies, constantly training our models with diverse and domain-specific datasets. Additionally, fostering a collaborative environment where linguists and AI specialists work together has led to significant innovations in refining translation quality.

How to maximize the potential of machine-translated content?

Leveraging post-editing involves a nuanced approach where the initial machine output is used as a base that is refined and enhanced by human expertise. We've developed a streamlined process where editors focus on contextual understanding and cultural nuances, ensuring that the final content resonates well with the target audience, thus maximizing the content's potential.

How to balance the cost and quality in machine translation projects, especially for large volumes?

Balancing cost and quality, especially for large volumes, requires a strategic approach to resource allocation. We employ a tiered system where critical content undergoes thorough manual reviews, while less sensitive material is processed through optimized automatic evaluations. This approach allows us to maintain high-quality standards while efficiently managing costs.

The Future Of Machine Translation Quality Estimation

Machine translation is utilized by millions of industries across the world daily. Companies such as Tomedes help feed this demand by providing accurate translation services to a global workforce. Neural machine translation technologies are improving every day to achieve better accuracy in translations. Scientists and researchers continue to find ways to solve the inefficiencies in translation services, with the ultimate goal of flawless translations.

As the translation industry continues to metamorphize, and as the use of machine translation and post-editing workflows increases, the demand for expertise in post-editing skills will grow. As they do, studies like the ones discussed in this article will continue to be necessary to improve the translation market. Machine translation quality control technologies, too, must continue to be researched and put in place to improve the accuracy of translations.

By Ofer Tirosh

Ofer Tirosh is the founder and CEO of Tomedes, a language technology and translation company that supports business growth through a range of innovative localization strategies. He has been helping companies reach their global goals since 2007.



Subscribe to receive all the latest updates from Tomedes.

Post your Comment

I want to receive a notification of new postings under this topic


Need expert language assistance? Inquire now

Do It Yourself

I want a free quote now and I'm ready to order my translations.

Do It For Me

I'd like Tomedes to provide a customized quote based on my specific needs.

Want to be part of our team?