With Apertus, ETH Zurich and EPFL have developed the world’s first fully open large language model. This shows that transparent, superior AI is within reach, which is exactly what Switzerland – and Europe – needs.
Unsplash
Apertus may sound like the name of a Roman in an Asterix comic book, but, sticking with the world of the wily Gaul for a moment, it is more like a term for Gallic defiance against the supremacy of Rome. This article introduces the Apertus language model and explains how the Swiss AI Initiative has scored a coup with its release.
Learning proverbs is an integral part of language acquisition. English native speakers will at some point learn that ‘The early bird…’ is usually followed by ‘catches the worm’. In principle, large language models (LLMs) do nothing more than that. They learn to calculate which words are more likely to occur in which contexts than all others. In other words, ‘catches the worm’ is usually the most likely way the phrase ‘The early bird…’ continues.
An LLM is a huge neural network – a mathematical computational model. The model takes words as input and issues words as output. In the interim, the input ‘The early bird…’ is used to calculate an output, ‘catches the worm’, in several conversion steps.
The model itself consists of the words and the relationships between them. You can think of it as a map. Words that are similar to each other are close together on this map. The further away two or more words are on this map, the more different their meanings. The trick is that LLMs model the language on the basis of not two, but rather many aspects.
Models comparable to today’s LLMs have been around since 2017. The general public became aware of them in 2022, when OpenAI released the ChatGPT chatbot. Meanwhile, there are many different applications based on LLMs. These include not only specialised systems, such as machine translation, but also widely applicable chatbots that can summarise content, suggest text, generate images and so on. Agents are increasingly emerging that operate independently in defined fields and are able to handle correspondence, develop programs or do business.
With the proprietary models of Alphabet, DeepSeek, Meta, Mistral and OpenAI, the catch is that they are not open, nor can users check what data was used to train them and whether this was legal.
The creators of Apertus have chosen the opposite path. While the model can, like some of the other models, be downloaded and installed as a separate instance, unlike the others, the entire architecture of Apertus is openly accessible. This means that anyone who is interested can follow step by step how Apertus came to be what it is – from the source code and the training data to information about exactly what settings were used during the training process. Apertus is not only a language model, therefore, but also a complete and reproducible documentation of its own development. This sets a new standard for openness and transparency. Apertus is the only large language model that is fully compatible with the European Union’s AI Act and local data protection laws, and which respects European copyright laws.
Another distinctive feature of Apertus is the fact that text data in more than 1,000 languages has been incorporated into the training set. Given the multilingual nature of the Internet, this does not sound that unusual, but it is, in fact, a major USP. While the usual models are primarily trained in English, Apertus has a commitment to multilingualism and cultural diversity, with only 60 per cent of its training data in English. Texts from rare languages, such as Romansh, have also been included in training Apertus.
The large language models on the market today come mainly from the USA or China. Given the increasing importance of technological superiority, this must give pause for thought. This is because LLMs are increasingly becoming a fundamental infrastructure for the functioning of fields of activity and industries. In the future, entire economies will be as dependent on them as they are on the Internet today.
From this point of view, individual use, individual prompts or individual instances installed on local servers are not what is problematic; it is the fact that Europe is only playing a secondary role in this development – and that there are hardly any alternatives to the models from the USA and China.
The fact that more than 100 scientists from ETH Zurich, EPFL and other Swiss universities have created and released an open large language model under the auspices of the Swiss AI Initiative is remarkable, as very different challenges have to be overcome for this to happen. A suitable computer is also needed to train the model: the Alps supercomputer in Lugano.
Building and operating these kinds of computers requires not only a high level of investment but also corresponding expertise. Apertus was trained for several months on more than 4,000 Nvidia chips. The electricity costs for this phase alone exceed the million Swiss Franc mark. Switzerland was fortunate to have bought chips before their export was restricted by the US government.
Apertus is far from achieving the same level as competitor applications developed with disproportionately higher investment. As obvious as a comparison between ChatGPT and Apertus may be, it misses the point of why Apertus is important.
On the one hand, Apertus is a scientific project that aims to research computer technology and train specialists: few students have the opportunity to work on a modern AI model during their studies, as is possible at ETH Zurich and EPFL.
On the other hand, Apertus is a basic model – a kind of engine – on the basis of which applications can be implemented in the future, for example by adapting (‘fine-tuning’) to industry-specific data. There is nothing preventing commercial use with the release under the Apache License 2.0. Interest from the open source community is enormous, as evidenced by the two million-plus downloads in the first few months.
In this regard, the journey with and the work on Apertus have only just begun – and it has yet to be revealed how an open European LLM evolves. One thing is already apparent: if we are not loud and clear when stating how important Apertus is to Switzerland as a research location, it will be a real missed opportunity.
AI and large language models are here to stay. It is therefore important that science deals directly with this technology. The fact that Switzerland has the opportunity to develop this type of model and help shape this technology is what counts.