Meta, the parent company of Facebook and Instagram, recently unveiled artificial intelligence (AI) models which can translate more than 4,000 languages. These Massively Multilingual Speech (MMS) models are behind speech-to-text versions which are able to identify over 4,000 spoken languages, and text-to-speech versions which can generate speech from over 1,100 languages.
In order for these models to learn these languages, large amounts of training data were needed. From these texts, the models could learn the languages by identifying patterns and relationships between words in a sentence and, eventually, translating the languages and outputting text or speech.
Chat GPT and other popular AI applications can identify around 100 languages. To find sufficient training data in many more languages for its MMS models, Meta turned to Bible texts and audio recordings as the Bible has been translated into many languages. (As at September 2022, Wycliffe’s statistics show that the New Testament has been translated into over 1,600 languages.)
Meta compiled New Testament readings in over 1,100 languages, providing an average of 32 hours of data per language. Other texts included recordings of Bible stories, evangelistic messages and songs. Altogether, over 500,000 hours of voice data of over 1,400 unique languages was used to train the self-supervised models. This was approximately five times as many languages than had ever been attempted before.
The many Bible translations used as training data are the result of decades of work of missionaries and scholars of the past and, more recently, members of Bible translation organisations such as Wycliffe Global Alliance, SIL International and United Bible Societies. Partner organisations such as Faith Comes By Hearing and YouVersion then distribute audio recordings and the original texts free on their websites.
Meta’s use of translated Bible texts highlights an oft-overlooked benefit of the work of Bible translation: not only is God’s Word made available to many people groups in their heart language, but languages that are in danger of disappearing are also preserved. Hopefully, coupling AI with the translated texts will enable technology to “speak” in the preferred language of these people, thus helping to keep their languages alive.