It is Friday evening, 7:20 pm. You receive an important message from a supplier in Graubünden. It is in Romansh, a language you do not understand. Your Romansh-speaking colleagues have already left to enjoy their well-deserved weekend. The message contains sensitive information that you are not allowed to share with third parties. So you cannot simply turn to the tools that you normally use to convert other emails into a foreign language, or to translate texts in your spare time. Nevertheless, you have to answer your supplier by 8:00 pm. You scroll through the company’s internal address book and ask yourself: Who understands Romansh?
Picture: Textshuttle
The simple answer is: There is software that understands Romansh. The more complicated answer: Machines cannot actually understand Romansh, but they are able to convert a text in almost any language into a text in another language with identical content. You know this already because of ChatGPT, Google Translate, DeepL or indeed: Textshuttle. The example described above constitutes a use case for in-house translation software. While large internet corporations in particular are offering freely available non-personalised translation software, Textshuttle, a University of Zurich start-up and spin-off, is developing machine-translation software that can be adapted to companies’ requirements. It can also be used to machine-translate sensitive information, because the data in question remains either in the company’s hands or on servers in Switzerland. Like the tools that are freely available on the internet, Textshuttle’s translation software relies on neural networks and machine learning, which has been standard in all natural-language-processing applications for about five or six years now. In some ways though, it differs from the tools freely available online. In May 2023, Textshuttle launched a platform for private individuals that offers freely available translations adapted to Switzerland’s linguistic peculiarities. In High German, for example, it outputs the letter ‘ß’ as ‘ss’, the Swiss equivalent. Moreover, it can translate into all Swiss national languages, including Romansh, as well as into Swiss German.
The scenario at the start of this article represents a use case that is indicative of the problems surrounding the protection of sensitive data. Another use case for machine-translation systems concerns professional translation work. Textshuttle’s CTO Samuel Läubli, who co-founded the firm, describes this as follows: “The difference between personalised and non-personalised translation systems is like the difference you experience when you go for a walk with a dog that you have known for a long time and a dog that you just take out for a single walk. On this walk, the question of how long you have known the dog, and whether you know it at all, will matter whenever situations arise in which subtleties, mutual consideration or trust are important.”
Many organisations have their own vernacular. They may use set phrases for which they have defined equivalents in various languages, for instance. These can be slogans, product names or very technical terms, which people (and machines) only translate correctly if they are familiar with the subject matter being translated and the associated linguistic subtleties. Professional translators and companies use digital dictionaries for this purpose, much like language style guides. In contrast to generally available translation systems, in-house solutions can be connected to translation memories: in-house databases used to standardise translations. In addition, a company’s in-house translation software can be trained for industry-specific or in-house texts.
In 2018, Microsoft Research published an article, in which the authors proclaimed that their machine-translation system achieved the same quality as human translators when translating from Chinese to English. This statement was based on a study in which native speakers of English and Chinese looked at individual translated sentences and evaluated the translation quality. When it comes to individual sentences, there is no statistically significant difference between evaluations of human and machine translations. Läubli dismisses this with a wave of his hand and explains that although this is a common way of measuring translation quality, it is actually too simplistic. This is because texts must be coherent as a whole, not just at the level of individual sentences. Indeed, if whole texts are submitted for evaluation, automatically translated texts are still consistently rated as significantly worse. Florian Schottmann, Head of Research, takes this a step further by stating that it is actually impossible to measure the quality of a translation, considering how many different correct translations are possible for even a simple sentence. For a whole text, the number of valid translations is (at least potentially) infinite. “This is illustrated by the literary works of Georges Perec and Raymond Queneau,” says Läubli.
The principal benefit of machine-translation systems is not that they relieve humans of the task of translation. Instead, the main argument for machine-translation systems is that they improve the efficiency of language services’ translation work. Depending on the application and industry, this efficiency gain can be between 40 and 60 percent.
The high expectations regarding natural language processing, as conveyed by high-profile applications, major language-based models like ChatGPT, or the aforementioned paper from Microsoft Research, are driving the industry and generating considerable momentum. Such expectations also entail challenges though, especially if the expectations are too high. On one hand, they lead to hype-based dynamics, which in turn lead to disappointment. Translators feel threatened and are therefore inclined to only see errors or nonsense in machine translation, rather than the actual opportunities that a digital translation industry brings. On the other hand, management only sees such systems as a potential means of reducing costs, rather than of enabling more efficient processes, translations of a higher quality than those provided by comparable free tools and new business models. Both positions, defence and exaggeration, are ultimately wrong if taken to extremes, because language service providers will still be needed – and because, one way or another, digitalisation will happen anyway and fundamentally change all industries.
Asked about the challenges, Schottmann answers by saying that many things are still unclear when it comes to dealing with data. What does it mean to delete training data, if it has been used to train a neuronal model? As yet, there is still no answer to this question in either theoretical or legal terms.
Schottmann adds that there are other difficulties – including the procurement law to which public tendering is subject. Läubli and Schottmann agree that procurement law and the procedure prescribed by it may make sense in many fields, but do not suit the way that digital projects, let alone AI projects, are realised.
On top of that, there are a multitude of challenges that may seem minor, but are no less serious, such as the lack of industry-wide standards for sharing and tagging data. This lack of standards and file formats makes it difficult to develop interfaces with other applications, such as translation software.
For large companies, automatic translation of spoken language is probably a bigger issue today than the translation of written texts. Currently, research in the field of machine-translation systems for written texts is looking into how the focus of the translation can be broadened: from the sentence to the document, and to the document library. Another challenge is that of implementing inclusive language or different tonalities, e.g. whether a text should be preliminarily translated into formal or casual language.