They trained it on two new data sets; one containing audio recordings of the New Testament Bible and its corresponding text taken from the Internet in 1,107 languages, and another containing untagged audio recordings of the New Testament in 3,809 languages. The team processed the audio and text data of the speech to improve its quality before running an algorithm designed to match the audio recordings to the accompanying text. They then repeated this process with a second algorithm trained on the newly aligned data. With this method, the researchers were able to teach the algorithm to learn a new language more easily, even without accompanying text.
“We can use what that model has learned and then quickly build speech systems with very, very little data,” said Michael Auli, a research scientist at Meta who worked on the project.
“We have very, very good data sets for English, and we have it for several other languages, but we just don’t have it for languages spoken by, say, 1,000 people.”
The researchers say their models can speak more than 1,000 languages, but recognize more than 4,000 languages.
They compared the models with models from rival companies, including OpenAI Whisper, and claimed theirs had half the error, despite 11 times as many languages.
However, the team cautions that the model is still at risk of incorrectly decoding certain words or phrases, which could lead to inaccurate or potentially offensive labels. They also admit that their speech recognition models produced more biased words than the other models, although only by 0.7% more.
While the scope of the research is impressive, using religious texts to train AI models can be controversial, said Chris Emezue, a researcher at Masakhane, an African language natural language development organization, who was not involved in the project. .
“The Bible has many biases and distortions,” he says.