Rev Launches First Open-Source Spanglish ASR Model

Rev Launches First Open-Source Spanglish Speech Recognition Model

Rev introduces the first open-source speech recognition model for Spanglish, bringing accessible transcription to bilingual communities.

Miguel del Rio

•

Speech Science Manager

December 19, 2024

Button Text

Table of contents

In a groundbreaking move for bilingual speech recognition, Rev is proud to announce the release of Reverb ASR Spanglish – the first open-source speech-to-text model specifically designed for Spanglish speakers. This innovation marks a significant step forward in making technology more accessible to diverse linguistic communities, and comes just two months after the company announced its first open-sourced ASR and Diarization models — Reverb.

What is Spanglish?

Spanglish isn't just random mixing of Spanish and English – it's a unique linguistic phenomenon where speakers seamlessly switch between languages, sometimes even mid-sentence.

It's the natural language of millions in communities from New York to Los Angeles, and Miami to Puerto Rico. Yet until now, no open-source speech recognition model has been specifically designed to handle this dynamic language pattern.

The Technical Challenge of Two Languages in One

Traditional speech recognition models face a unique challenge with Spanglish: they're typically trained to handle either Spanish or English, but not both simultaneously. This leads to various issues. Models may miss quick language switches and some may attempt to translate rather than transcribe. Further, important cultural and linguistic nuances get lost in translation, losing what the speaker actually said.

Though other models, like OpenAI’s Whisper, inadvertently have the ability to transcribe Spanglish, as far as we know, Reverb ASR Spanglish is the first open sourced model trained with that explicit purpose.

Setting New Standards in Bilingual Speech Recognition

We ran some experiments to understand how we compare to Whisper on a popular Spanglish dataset, TalkBank Miami Corpus, that contains dialogue between Spanglish speakers in Miami.

Model	TalkBank Miami Corpus
Reverb ASR Spanglish	29.16
Whisper Large V3	32.94

While there's still room for improvement before code switched languages like Spanglish can be automatically transcribed like we do in English, this represents a significant step forward in bilingual speech recognition and accessibility (it can also help with content localization, if needed).

The model is now available on our self-hosted solution and will be coming soon to Hugging Face. We're excited to see how the community will build upon this foundation to create even better solutions for bilingual speech recognition.

Topics: