Voice Recognition Researchers Are Paving The Way For The Internet of Things With Robots That Can Hear

Robots like Boston Dynamic's Atlas aren't warm and fuzzy, but new research into human-robot interaction is producing therapy robots that are more comforting to humans.
Robots like Boston Dynamic's Atlas aren't warm and fuzzy, but new research into human-robot interaction is producing therapy robots that are more comforting to humans. Reuters

Siri, Echo, and Cortana voice recognition gets all the attention as consumer products strive to get closer and closer to that elusive Star Trek computer, but there’s a lot more to computers that hear than voice recognition. We spoke with Audeme co-founders and audio recognition experts Gerald Friedland and Bertrand Irissou about how voice recognition works today and why it’s just the beginning for a future full of computers with ears.

The Computer Voice Recognition Future

We’re Working On It…

Audeme just Kickstarted its first product, a standalone speech recognizer for Arduino and other hobbyist projects. It can be programmed with simple commands to achieve all sorts of hobbyist visions using voice recognition.

“You basically can use it for anything where voice recognition makes sense, right?” Friedland told us. “So you have a toaster, you tell the toaster ‘I want Level 4 toasted today.’” You can read more about MOVI and what it offers Arduino enthusiasts right here .

But while Apple, Google, Audeme, and other companies are getting better and better at the complex dance of phonemes, language model contexts, and Big Data voice variety that drives voice recognition, the future will go far beyond human words.

Friedland says the real interesting work begins when “we take the speech recognition as a given and now we can add on top of that.”

Computers Need Elephant Ears

One of the biggest revolutions to come from computer voice and sound recognition has very little to do with humans talking to computers. Rather it’s about vastly expanding a computer’s ability to organize and understand information.

“Together with Flickr we released a corpus of 100 million images and 1 million videos for research.” The idea is to teach computers to recognize “multimodal” inputs; processing picture, video, and sound to detect and identify the actual content, rather than relying on keyword metadata.

“Right now video search and YouTube or Flickr is based on keywords that are text matched to the text that has been annotated to the video by the user.” But the future Friedland and Irrisou envision would have computers recognize and identity these features on their own.

“The whole goal for Audeme was not just to create this MOVI shield, but to create hardware that recognizes sounds in the environment,” Irrisou said. Teaching computers this capacity would revolutionize their ability to interact with the real-world. Irrisou offered the example of a security device that could recognize your fire alarm going off when you’re not at home.

Sound Recognition and The Future of Robotics

A more generalized capacity for speech and sound recognition could also change the future of robotics. “There are a bunch of companies in Silicon Valley who have mobile, autonomous robots for which it would be very useful to have a little ROS (Robot Operating System) module that would say ‘Hey, I just heard the car start,’ ‘Hey, I just heard a person walk by.’”

There are many obstacles still in the way. “It takes time and we need to put infrastructure in place to acquire a lot of data on sounds,” Irrisou says, before describing just one hurdle that will have to be overcome in making even minor advances: “If you want to recognize a car alarm going off, there's going to be at least several hundred types of alarms in various conditions, so you need a very large corpus to be able to recognize a car alarm in general and not just recognize a Honda Civic.”

Still, the Audeme co-founders are hopeful that as voice recognition gets cheaper and cheaper a virtuous cycle of new ideas and techniques will get us closer to a future where our electronics can hear and understand the real world. This is when the much-hyped “Internet of Things” can finally take off. Voice recognition is just the beginning for computers developing “a non-intrusive way of adapting to the world,” Irrisou says. “Speech recognition is not really an end, but a means to an end.”

Join the Discussion
Top Stories