Children’s voices recorded to make speech recognition systems more child friendly
Engineers from the University of New South Wales (UNSW) Sydney are leading a drive to sample the voices of Australian children to ensure they can be better understood by devices that use voice recognition software.
Researchers believe the benefits will extend beyond making voice recognition more inclusive of children, with potential benefits including the enhancement of education and speech therapy where digital devices could provide immediate and ongoing feedback in speech training and other learning tasks.
Until the researchers began their project, speech recognition software, which powers familiar virtual assistants such as Alexa and Siri, has relied on databases composed of adult voices – something which is set to change with the launch of AusKidTalk, a joint project of five Australian universities that aims to build a database of Australian children’s voices.
While speech recognition technology has made leaps and bounds in the last decade, the technology is still lagging when it comes to understanding and speaking with children, Dr Beena Ahmed, a senior lecturer with UNSW’s School of Electrical Engineering and Telecommunications said.
“There’s been a big improvement in speech recognition to work with different accents and languages,” she said. “But so far that has just been for adults. There is a definite shortage of data for children – not just in Australia, but all over the world. This is despite children being such an important demographic. Companies like Amazon, Apple and Google are all starting to notice that this is a big market.”
To progress their work, Dr Ahmed and her fellow engineers, linguists, psychologists and speech pathologists are about to start recruiting 750 children between the ages of three and 12 years of age to provide speech samples as part of the AusKidTalk program.
In sound-proof studios located at each of the five campuses, the children will be recorded as they are prompted to repeat words, digits and sentences before engaging in unscripted storytelling exercises.
The new database of children’s speech will be used by linguists and psychologists to better understand how children develop their speech and language. Engineers, meanwhile, will be able to use it to develop new speech recognition systems that will interact with younger users much more seamlessly.
The main driver for their work, Dr Ahmed said, is that the accuracy of speech recognition systems when interacting with children has so far been quite poor.
“Children’s speech is quite different from adults’ speech. Children’s language skills aren’t as sophisticated as adults’. They might mispronounce or leave sounds or words out, or change the expected order of words. Then there are physiological differences – their vocal tract isn’t fully developed, and until they hit puberty, they speak in much higher pitches. All this makes their speech very different from adults and therefore harder for speech recognition systems to process,” she said.
In addition to recording samples of typical speech, the researchers will also be recording samples of disordered speech spoken by children.
The idea behind this is if speech recognition systems could be taught to recognise when children are having problems forming words, they could not only be used to understand voice commands spoken by kids with impaired speech, but could also be used therapeutically to help with speech training using a mobile device.
With some parents spending up to $200 a session for speech therapy, and with subsequent sessions not being able to access feedback from speech therapists, the technology could be a bridge which enhances children’s experiences with accessing support for language needs.
“Another problem is that parents can also find it hard to provide feedback themselves, because they’re not properly trained or because they’re already tuned to understand their kids in cases where others might not,” Dr Ahmed added.
The advent of an automated speech therapy tool, would allow children and parents to get instant feedback when they practice what they’ve learned with a clinician, she added.
“It would give children immediate and ongoing access. You can’t expect this level of attention from limited appointments with limited numbers of available pathologists.”
Speech recognition systems using a database of children’s voices could also have benefits in education, reducing the need for parent volunteers to listen to children reading, for example.
The researchers said the COVID-19 pandemic has shown just how important remote communication and learning tools are.
“Unfortunately, children have not been able to benefit from these tools as much as adults due to a lack of effective speech-based tools for remote speech therapy and learning – so they likely have not been able to get the same benefit from telehealth and tele-education tools,” Dr Ahmed said.
Once the samples of 750 children have been recorded and integrated into a speech recognition system, an open source database will be available online for other researchers to work with, with the project expected to be completed by June 2021.