“Hey Alexa, does my product or service need a voice app?”
While eye tracking and gaze interactions are still in their early development phase, and gesture is mostly applied within the gaming and museum industries, voice interfaces are being widely adopted and integrated into our everyday. It’s rapidly becoming a valid avenue to solving real life problems, in ways that weren’t available before. No longer simply a technology trend, voice UI has emerged as one of the most efficient ways for people to interact with services, products and businesses, where there is a need for an hands free, eye free (or both) interaction.
Coupled with that is the explosion of viable device options. Alexa, Siri, Cortana, and Google Home are finally coming into their own. Siri recently received a new voice that will make her sound more human. Google’s speech recognition technology has a 4.9%-word error rate, compared to 8.5% in July 2016. Alexa can now whisper and adjust her pitch. Both Google Home and Alexa have the ability to recognize multiple voices – an ideal feature for households with multiple family members.
Beyond the smart speaker
There’s no denying that smart speakers are quickly becoming in-home powerhouses – with Amazon Echo sales rapidly rising to over 20 million since the third quarter of 2016, Forrester expects 50% of U.S. households to have at least one smart speaker by 2022.
In addition, Alexa is growing into a powerful platform that can be integrated into, well, almost everything. As a platform, it’s giving the Internet of Things a new dimension, enhancing products as diverse as cars (BMW, for example) to security cameras. Here at Smart we’ve used Alexa’s base functionality, and then successfully layered on custom embedded hardware whilst developing a proof-of-concept voice app for Gatorade Gx.
When crafting a personalized voice assistant for a woman living with multiple sclerosis for the upcoming BBC Two series, The Big Life Fix, we also found a way to alter some of the awkward trigger phrases required by current device software. ‘Wake words’ are what triggers a voice-first device to listen to your commands. For example, “Okay, Google” or “Hey, Siri”. Our technology director was able to alter these wake words to facilitate a more personalized interaction between our voice solution and the patient, Susan.
With all the recent traction, it makes perfect sense for many digital services to consider jumping on the bandwagon of developing their own voice-enabled apps and features. But does taking that leap make sense for your brand or product? Here are a few things you should consider before diving into the screenless experience arena:
Everyone can have a voice, but very few are currently listening
While sales have soared, it’s worth noting that retention rates on voice UI apps are still quite low. In mid-2017, Alexa surpassed 15,000 offered Skills, covering everything from helpful productivity hacks, to an impressive number of intelligent entertainment assists. Yet only 31% of those Skills have more than one consumer review. The user retention after two weeks is 3% (compared to 10-11% for a typical mobile app on iOS or Android).
Why is this the case? Consider the early days of iPhone apps – remember the insane popularity of iFart back in 2008? Well, Alexa has its own “fart” Skill, and it has 183 reviews. Fart skills aside, it is not surprising that simple, single-command utilities are the most popular type, and that the most used Alexa Skill is still “set a timer”.
Even with its low retention rate, we can’t disregard the snowballing trends of voice-driven technology. Just like the mass adoption of smartphones and mobile apps have skyrocketed over the past decade, so too will the need for true, consistently hands-free interactions on these smart assistants. The potential is too great to ignore, and starting now can only increase the likelihood of success in a few years’ time.
Consider the context and intent of giving your product a voice
As with any new product or service, asking the right questions is a good way to evaluate a need. To understand context of use, simply ask “where” and “when”. To uncover user intent, ask “why”.
Currently, those who use voice interfaces are primarily doing so at home or in the car. Consider these statistics: 52% of smart speaker owners keep their device in the living room, 24% in the kitchen, and 12% in the master bedroom. The top reasons for using voice are speed and hands; and vision-free interaction. Screenless experiences also mean the same technologies can be applied by visually impaired people, for whom graphical user interfaces are useless without the assistance of screen readers.
Amazon recently released a new set of devices to be used in a wider context, including one designed to live next to your bed. In addition, dozens of devices that integrate Alexa were introduced at this year’s CES, including Smart home camera company Canary, who is linking their cameras to enable viewing of home security footage from their device in real time. Alexa is now also available on the Amazon Prime iOS app, and is a part of the new HTC phone operating system, making her more mobile than ever.
Voice apps are well-suited for particular times and places, but remain challenging for many other contexts, especially social settings. Public spaces are especially taxing because of background noise and the feeling of awkwardness associated with speaking to machines in public. Although smart assistants can create to do lists and schedule meetings, only 3% of people utilize them in a work setting. But this is rapidly changing. At companies like NASA and AstraZeneca, Alexa is already being integrated for standard procedure reminders, and to rearrange conference rooms.
In certain contexts, you may want to consider adding additional controls, to increase flexibility. (the iPhone is a great example of an intuitive combination of touch, voice and gesture controls). For example, when loud music is playing, smart speakers will experience a hard time picking up your voice. The first generation echo had a physical volume ring, while the second generation comes with volume up / volume down buttons.
By considering the context and intent of your current product offering, you will be able to evaluate when and where consumers are using it the most often. Let these findings guide how and when you enhance your experience with a voice-based component.
Maintaining the human element (but not too human).
Despite the fact that most of today’s digital assistance have a female name and a slightly sassy personality, we are still very far from having actual conversations with our “conversational” interfaces. Sure, humans have a natural tendency for anthropomorphism (the attribution of human traits to nonhuman entities), but for most of us, a voice interface is just another mechanism to navigate and consume digital content (news, music), request a service (Uber, ordering a pizza) or control a utility (timer, lights). In other words, it is businesses and digital services that speak through a voice interface, while customers speak to it, not with it.
Still, a voice interface is the most human-like interface, designed to adjust to natural behavior unlike any other. Smart speakers such as the Echo can pick up our voices from anywhere in the room and the hands-free interaction allows us to master multitasking. The current Skills hardwired into Alexa even use filler words, making her responses to even the simplest requests seem more human than their UI-based counterparts. For example, when I ask Alexa what’s in my shopping cart, she says, “Looks like your cart is empty.” instead of, “Your cart is empty.” or “Cart empty.”
This humanization, however, can be a double-edged sword. The use of pauses and filler words can get repetitive and frustrating over time. It feels natural when talking to a human, but when a bot does it, “It closes up the flow of command and action,” notes Clive Thompson at WIRED, “and it can make the skill sound like a 19th century butler.” A voice app must recognize filler words as part of a user input, but should be selective in using them in its response.
The opportunity is vast for those willing to seize it.
The predictions all pointed to 2017 being “huge” for voice; while it may not have surfaced to compete to the extent and impact Mobile has made on our lives, our belief is that this year is when voice truly starts to add value to people’s lives. As we kick off 2018, it’s clear that the rise of smart home assistants and voice applications are a huge opportunity for brands seeking new ways to interact with their customers, allowing them to advance their product and service offerings in a way that screen-based UI never could. Indeed, CES deemed voice assistant integration the top smart-home trend to watch this year.
These innovations present the ability to connect with new audiences, such as the elderly, for whom the advanced UI of smartphone apps might elude. Screenless activity such as speaking on a phone or verbally requesting actions in a conversational manner, comes much more naturally.
Companies can connect with their customers on a deeper level than previously thought possible through humanizing their brand and voice, and gathering data to feed back into an even cleaner and more fluid experience through machine learning.
And while primarily utilitarian in their current state, future generations of voice assistants may one day offer consumers a more intimate relationship, applying empathy and social skills to offer companionship, and perhaps even talk therapy. The possibilities are truly endless. Now is the moment for future-thinking brands to help define what roles smart assistants will play in people’s lives, and how that can evolve to add more meaning and value over time.
Decided that voice might be the right solution for your product or service? Head to Fast Company for our 5 lessons learned when designing for voice.