Jeff Hebert
Dylan Garrett
VP Consumer & Industrial Business

Using Autonomous Robotics to Unlock a Next-Generation Natural UI

As the line between the digital and physical worlds blurs, people are demanding new ways to interact with digital technology without being bounded by small touch screens and laptops. Gartner predicts that by 2020, about 30% of all web-browsing sessions will take place without the use of a screen.

A huge driving force behind this trend has been the rapid adoption of voice and audio as input and feedback for personal assistants, cars, phones, and more. But audio interfaces alone miss a lot of context—they’re like talking to a friend on the other side of a wall. To enable much more intuitive user experiences and help launch digital interactions forward, we are combining voice with other technologies like spatial awareness, 3D mapping, person identification, and gesture recognition.

Imagine being able to say “Ok Google, turn off that light” while pointing at the light you want turned off, or ask “Alexa, where did I leave my keys?” or even say “Siri, bring this box to Steve.” We are developing a robot that enables these kinds of experiences. The big idea is to interact with the digital world like you would with a friend, referencing objects and locations around you in an entirely new way that is much less bounded than the pure audio or one-way display experiences possible with today’s virtual assistants.

Enabling Technologies

Behind such a natural interaction, it’s easy to forget that there is a lot of custom, cutting edge engineering in the background making it possible. Here are some of the technologies we’re integrating to enable this more natural experience.

The robot autonomously builds a map of its surroundings, enabling intelligent navigation and contextual awareness in its interactions with people.

Autonomous Robotic 3-D Spatial Mapping

Our robot uses Simultaneous Localization and Mapping (SLAM) to understand the world around it. It autonomously explores physical spaces—offices, homes, factory floors, venues, and other places of interest, creating a persistent 3D map of the walls and floors using depth sensors, video cameras, and onboard processing. It’s aware of its own location and where it’s been, thanks to visual odometry technology, which tracks and records its position and orientation in 3D space. This tech also lets the robot know where to explore next and update its map when transient objects move out of the way.

When you gesture to the robot, it recognizes and utilizes things like the direction you’re pointing, to better understand the task you might be trying to accomplish.
When you gesture to the robot, it recognizes and utilizes things like the direction you’re pointing, to better understand the task you might be trying to accomplish.

Real-Time, Vision-Based Face, Object, and Gesture Recognition

Using the same cameras and onboard processing that enable our mapping and navigation, the robot also recognizes people, objects, and gestures using neural network-based machine vision. When possible, we train our system using existing public datasets for things like skeletal gesture mapping and facial recognition, but we can also collect and synthetically generate our own training data when needed.

‍These onboard ultrasonic transducers are part of the sensor suite that help the robot navigate autonomously and build a map of its surroundings.
These onboard ultrasonic transducers are part of the sensor suite that help the robot navigate autonomously and build a map of its surroundings.

Voice Recognition

Voice recognition technology is all around us these days, but it still isn’t necessarily easy to implement well for novel use cases. We want our robot to perform effectively while driving around and in noisy environments, so mechanical isolation of the microphone array is critical, as is noise and echo cancellation. To maximize system accuracy, we’re using a state-of-the-art phased array microphone technology along with modern speech recognition software. We've also combined the audio sensing with our vision-based human recognition engine, so we not only know what is being said, but who is saying it, and where they are in relation to the robot.

The artificially intelligent “brain” of our robot turns disparate interactions into a natural user interface.
The artificially intelligent “brain” of our robot turns disparate interactions into a natural user interface.

Artificial Intelligence and Sensor Fusion Tie it all Together

Our AI engine enables the robot to infer your intent by combining voice commands with gestures and its 3D map. It’s this combination of disparate datasets that enables such a natural user experience. When the robot sees you point at a light, for example, it knows roughly the orientation and position of your arm relative to its own position and orientation, and can use that to triangulate where on its spatial map you are pointing. Because it has mapped the room, it knows which controllable objects are located in that direction, and because it has heard you say “turn on that light,” it knows that the light is the object you’re asking to control.

Looking Forward

We’re excited to push digital-physical interfaces even further forward, allowing us to look up from our screens and interact with the world in a more natural way while still benefiting from the huge ongoing advances in digital technology.  Check out some of the articles and case studies below or get in touch to talk about how we could work together to make magical experiences a reality.

Oops! Something went wrong! Please try again!

See what else is new...

November 2, 2020

Synapse’s Diversity, Equity, and Inclusion Evolution

Over the last six years, we’ve made an effort to build diversity, equity, and inclusion into the fabric of our organization. From the beginning, we’ve taken an iterative approach, revisiting our initiatives, processes, and policies to make improvements over time, multiple times. Now that we’ve made significant progress, we want to share insights that we hope will help you make positive change at your own organization.

October 19, 2020

The ME Team Goes Virtual: 4 Ways We’ve Tackled the Challenge of Making Things in a Virtual World

The mechanical engineering team at Synapse has gotten creative in finding solutions for working together remotely. Following Ann Torres’ (our VP of Engineering in San Francisco) great discussion with Fictiv and Cooper Perkins on How to build a Physical Product in the Virtual New World, our team tackled some of the same challenges and developed solutions of our own.

See what else is new...

October 6, 2020

Natural UI: 5 Design Tenets for Uniting Physical and Digital Interfaces

Consumers are seeking out Natural User Interfaces (NUIs)—technology that can be controlled by whatever method is most convenient in that moment, therefore blending seamlessly into our surroundings. Today’s smart devices attempt to achieve this by combining physical control interfaces with layers of digital innovation, from voice commands and gesture recognition to gaze tracking and machine vision. But is this a guaranteed improvement? Not without deliberate design.

September 8, 2020

Bringing Physical User Interfaces Back in a Connected World: An Intro to RPCIs

We believe that connecting products to the Internet and otherwise adding digital “smarts” to them can enable powerful new functionality and make products much more useful to their users. That being said, we care deeply about the user experience of physical products. We feel strongly that the industrial design and user experience of a product should be constricted as little as possible by the addition of digital technology. That’s why we started exploring the concept of reactive physical control interfaces (RPCIs)—physical controls that self-actuate in response to secondary digital control.

December 4, 2019

[Watch] Stop Yelling at Alexa, She Doesn’t Get You…Yet

The recent success of smart speakers has been a great leap in human-digital interaction, but there’s still a lot of space for developers and companies to cover to create smart devices and environments with truly engaging and intuitive interfaces. While voice command technology can handle simple tasks, the interaction can fall short because it has modest understanding of human intent.