You are here

Building Inclusive Speech Technology with Diverse Data

Submitted by Defined.ai on Thu, 05/19/2022 - 03:07

Inclusive speech recognition technology that is trained on diverse, accented speech data is the key to staying relevant in the voice recognition market.
Three New Yorkers walk into a bar: one grew up in the Midwest in a Mexican family, another is a native Spanish speaker from Colombia, and the last, a New Yorker who spoke Castilian Spanish at home until high school. There’s no punchline here: they simply sit down and have a conversation in English.

As they speak, we observe major differences in the speech of each person. Geography, socio- economic status, and ethnicity, among other factors, all cause variations in pronunciation, vocabulary, and other speech patterns.

Given those differences, what happens when each of them goes home to their voice assistant and makes a request in English? How well is each of their accents understood? And what are the consequences for those who aren’t understood?

Those are essential questions for data scientists, developers, and other AI speech professionals as they work to create speech recognition technology that is inclusive, diverse, and free from biases caused by an accent gap.

Bridging the accent gap
An accent gap is a type of algorithmic bias that occurs in voice recognition models that lack training with diverse, representative data, for example, models trained exclusively on English speech data sourced from a single geographic and cultural background. This “accent gap” can be frustrating to users who fall outside the narrow definition of an English speaker (predominantly white, upper-class male speakers), resulting in a product that doesn’t meet the needs of a diverse market.

An accent gap can affect speech technology of all kinds. For example, one Washington Post study found that Amazon’s Alexa was 30% less likely to understand non-native English accents. In the same study, voice assistants from Google and other major competitors produced similar results.

This means that to compete long-term in the voice recognition market and to create inclusive speech products, your model must understand accented speech. And when we say “models” we don’t just mean voice assistants. All models and devices that make up the Internet of Things (IoT), many of which use voice activation and recognition as part of their core offering, should be trained on diverse, representative, and bias-aware training data.

By releasing a free Spanish-accented English speech dataset, Defined.ai aims to help AI professionals test whether their models present accent gap for one specific group: non-native English speakers in the US whose native language is Spanish.
Follow the link to read more about speech technology and get access to free speech data https://www.defined.ai/blog/building-inclusive-speech-technology-with-di...