Menu Close

Technology: Google creating a universal translator with USM

*The global technology giant announces it has completed a ‘critical first step’ in its planned 1,000 Languages Initiative — a machine learning model that can understand 1,000 of the world’s most spoken languages

Gbenga Kayode | ConsumerConnect

Google has inched further in attaining a translation breakthrough with a step closer to creating a universal translator.

The global technology giant reported this week that it had completed a “critical first step” in its planned 1,000 Languages Initiative — a machine learning model that can understand 1,000 of the world’s most spoken languages.

Photo: LinkedIn News

The company disclosed the “foundation” for this initiative, its Universal Speech Model, can now support more than 100 languages, and is already being used in YouTube’s auto-translated captions.

It was learnt the technology could eventually be deployed in augmented-reality glasses, The Verge writes, offering “real-time translations that appear right before your eyes.”

Likewise, Director of Machine Learning at Cyclica, tweeted about Google’s released Universal Speech Model (USM).

The expert also noted the technology firm announced it as a critical first step towards supporting 1,000 languages in the following order of the USM:

2 billion parameters trained on

– 12 million hours of speech

– 28 billion sentences of text

spanning 300+ languages.

USM, which is for use in YouTube (e.g., for closed captions), can perform Automatic Speech Recognition (ASR) not only on widely-spoken languages, such as English and Mandarin, but also on under-resourced languages, including Assamese and Azerbaijani.

According to the Director, the training steps involve the following:

  1. Self-supervised learning on speech audio covering hundreds of languages: “For the first step, we use BEST-RQ, which has already demonstrated state-of-the-art results on multilingual tasks and has proven to be efficient when using very large amounts of unsupervised audio data.”
  2. Additional pre-training step with text data: “In the second (optional) step, we used multi-objective supervised pre-training to incorporate knowledge from additional text data.

“The model introduces an additional encoder module to take text as input and additional layers to combine the output of the speech encoder and the text encoder, and trains the model jointly on unlabeled speech, labeled speech, and text data,” and

  1. Fine-tuning on the downstream tasks.

    Kindly Share This Story




Kindly share this story