Dreaming about what else Voice UI could bring to your product with custom engineering
These days, it seems like everything is voice enabled. It’s an exciting step forward towards truly natural user interfaces (UI), allowing us to interact with technology (and each other) in a more seamless way than tapping at a screen. Most voice-enabled products use off-the-shelf assistants like Google Home or Alexa, and it’s amazing how easy it is for product developers to bring a basic voice UI capability to devices. While it’s easy to get started, actually making one of these off-the-shelf voice assistants perform really well requires some difficult engineering—like noise/echo cancellation and far-field speaker isolation; challenges we love to tackle with our clients.
This is an exciting subject for those of us who are always looking at ways to push the envelope with technology and design, and below we explore some ideas about how voice UIs could be used to enable completely new user experiences.
Optimize for specific words or environments
If your product is commonly used in certain settings that have unique ambient noise signatures, say in a factory, your audio processing could be optimized to filter out that noise. A task that may otherwise prove challenging or power hungry for a general purpose voice UI.
Similarly, your use case may require critical recognition of specific words. Notable, for example, provides a system that uses voice recognition and AI for medical purposes in an environment where there’s a lot of specific vocabulary used. A general-purpose voice UI will likely get confused by medical terms like “rhinorrhea” (clogged sinuses) or “sphenopalatine ganglioneuralgia” (“brain freeze”), but a custom system could be trained for this.
Or you may decide that the off the shelf wake words like “Okay Google” and “Hey Siri” don’t suit your use case or complicate your brand experience. Training a system on a new wake word is a challenge but definitely possible.
Optimize for low power
At Synapse we’ve been developing small form-factor wearable devices for years and have become experts in low-power design, and when looking at currently available voice assistants, they aren’t optimized to run on battery power. However with a well-understood use case we could develop a low-power UI appropriate for battery operation; bringing us all a little closer to being a bunch of Dick Tracys.
Our Cambridge team recently announced “Ecoutez”, which is a technology to enable Voice Activity Detection with lower power consumption than other available options. This could allow for the “always listening” functionality of an Echo to run on a small battery rather than wall power.
Hear more than just words from speech
What if you require recognizing something about a person’s speech other than the words themselves? One area where we’ve been making some progress is in identifying the speaker based on their voice to allow the UI to respond differently for different people. Our Aksent system can detect your accent, and our research shows that we could determine a user’s gender or even age from voice alone, which could be valuable for market research purposes.
Or what if your device could detect a person’s emotion and respond differently depending on if the speaker is scared, angry, or happy? Maybe I want the music to turn down a lot faster if I angrily yell “volume down” than if I say it happily.
Hearing beyond speech
Imagine a system that could recognize what’s going on around the user through ambient noises, and then pair that with speech to provide better informed responses. Apple is scratching the surface of this when Siri responds to the question “what song is this?” But it’s possible to interpret so much more about our surroundings through audio.
In a smart home application, your voice assistant could recognize the sound of a knock at the door, and then if you ask “who is it?” your video doorbell could turn on. It’s also possible to detect sounds like breaking windows for security, or maybe the sound of a person falling, for elderly care. Our team has done some groundbreaking work in non-verbal audio recognition, including complex tasks like recognizing music and categorizing it by genre using deep learning, as shown in the video below.
As a new dad, I would love a system that could listen to my baby’s crying and tell me if she’s hungry or has just had another blow out!
Combining speech with other sensing modalities for more natural UI
Along with verbal communication, we humans also use physical actions to communicate. A more natural UI could incorporate input from a camera or other sensors to recognize and understand facial expressions and hand gestures, for example.
Dallas-based startup KinTrans is using computer vision to translate sign language to text. An assistant could use that kind of gesture recognition to allow more natural interactions. A machine could better diagnose ailments with the ability to pinpoint exactly where on the body pain is coming from by recognizing voice and hand movement together. A flying drone could respond to gestures as well as voice control, like Amazon has envisioned in a recent patent.
What would you want your assistant to do?
Currently available voice assistants are amazing solutions and have already changed the way we interact with technology, but, even greater personalization and utility is possible. If you’re developing a device that needs a really well executed voice assistant or if you want to push the envelope and create a new kind of user experience, please get in touch—we’d love to geek out!
Delivering on AI’s potential starts with the data. Learn how to spot and address data quality challenges with this survey of the industry & research landscape on the role of data in machine learning-powered applications with real-world deployments.
Connected devices are leveraging rapid developments in voice control and machine vision to enable more seamless user experiences known as natural user interfaces (UI) or zero UI. But “seamless” and “natural” to whom? And in what context? Combining physical and digital interfaces so that a product can support various modes of interaction results in the most accessible products and intuitive experiences.
Consumers are seeking out Natural User Interfaces (NUIs)—technology that can be controlled by whatever method is most convenient in that moment, therefore blending seamlessly into our surroundings. Today’s smart devices attempt to achieve this by combining physical control interfaces with layers of digital innovation, from voice commands and gesture recognition to gaze tracking and machine vision. But is this a guaranteed improvement? Not without deliberate design.
We believe that connecting products to the Internet and otherwise adding digital “smarts” to them can enable powerful new functionality and make products much more useful to their users. That being said, we care deeply about the user experience of physical products. We feel strongly that the industrial design and user experience of a product should be constricted as little as possible by the addition of digital technology. That’s why we started exploring the concept of reactive physical control interfaces (RPCIs)—physical controls that self-actuate in response to secondary digital control.
Synapse is a product development firm. We work with the best companies in the world to drive innovation and introduce cutting-edge devices that positively impact our lives. Fueled by a desire to solve complex engineering challenges, we develop products that transform brands and accelerate advances in technology.