Why Voice Assistants will Become the Ultimate Monetization Machines

It’s all about the user’s intents

Pascal Heynol
Chatbots Life

--

Voice assistants are the next big thing. Some say they’re the next mobile, though I don’t even know if that’s accurate or an understatement. All the major platform companies have one, and startups building them appear ever faster, making it hard to even keep track of everything. The point is, they are going to be everywhere and are going to dominate the way we interact with our computers.

Yet I hear many questioning if these assistants are even viable from a business perspective. The argument goes that by moving people away from screens, assistants may be diminishing traditional screen-based revenue streams. How is Google going to sell ads along with their search results if the user gets taken directly to the information they desire without ever looking at a list of results?

Content providers may indeed have a harder time turning their work into paychecks. If you’re running a blog or publication, your main business is placing ads next to your reporting. When more people move away from screens and have their news read to them by an AI instead, fewer people will see your ads. Though if that is something people are actually going to do in significant quantities remains to be seen.

For the companies operating the voice assistants, however, they will become a gold mine. Even better, their value proposition for the customer is precisely what makes them valuable for the operating businesses. All it takes to cash in is a simple two-step plan:

1. The Path of Greatest Convenience

The first part of the story is about getting to market dominance, or at least gaining a significant share of the market, and growing the overall market volume at the same time.

Essentially, companies are trying to get as many people as possible to use their system. And get these people to use their system as much as possible. That’s why at the moment, all weight is behind making these voice assistants useful and their interaction feel natural. We’re supposed to get used to talking to robots.

The useful part of this puzzle is about handling relevant key use cases on the one hand while supporting action in an extremely broad area of tasks on the other. As with most innovations, a few key functions are what people really want and what keeps them coming back. Still, it has to be universally useful and support the user with whatever she wants to accomplish. This is especially important for voice. Without any visual cues for the available functionality, all the user is left with is trial and error. Every missed command the system can’t understand or act upon is disappointing for the user. Get a command wrong one to many times and you will be frustrated enough to not use it anymore.

So as long as no one entity creates an assistant that handles everything, there is a demand, and a possible place for coexistence, for many T-shaped ones. The crucial part here, however, is acknowledging coexistence, knowing strengths and weaknesses and facilitating actions and responses between these systems. Imagine an assistant that is great at smart-home coordination and control, but has no exceptional skills in most other departments. By itself, it seems not too convenient and you might instead turn to one that does pretty well generally but is just a bit weaker in smart-home stuff. However, if this first one is now also able to facilitate between different assistants, that could become a whole different story. It could call Google Assistant for knowledge questions, Amazon’s Alexa for shopping tasks, and so on and so forth. Now that would be pretty convenient!

As soon as an assistant handles core functions well enough, and delights instead of disappoints in most other, more general requests, people might actually use it to an extent and volume that makes it interesting for companies.

The part about conversations feeling natural is just as important. It needs two things for conversations with machines to feel natural: speech synthesis and conversation flow.

Speech synthesis, nowadays, describes the action of a computer producing actual “spoken” sounds from written words and data. This begins with arranging prerecorded syllables one by one and becomes ever more complicated when incorporating important traits of our languages, such as intonation and flow. Technology has gotten really good at this as you can see in currently available voice assistants. While in most cases you can still easily tell that you’re talking to a robot, speech synthesis has reached a state of being good enough to have a conversation. You can clearly understand what the machine is trying to say without the sound of it coming off as a distraction of any sort.

The next big challenges in the field are about making sounds even more human. Robots are good at communicating facts, but conversation is about so much more than plain facts. We use speech to direct attention, convey emotions and carry more meaning than the individual words. Getting our robots to follow conversational conventions by producing and using all these stylistic measures correctly and effectively is the current area of focus in the field. And one where our robot friends still have a lot to learn.

Conversation flow describes, in simple terms, how well the conversation is going overall. For proper conversation flow, it takes both parties to be benevolently engaged, actively listening and understanding. Let’s break that down:

  • benevolently engaged: wanting the best outcome for the other party and taking action towards this goal
  • listening: being focused and hearing what the other party is saying
  • understanding: recognizing and comprehending both literal and contextual/tonal information

Listening here translates to microphone technology and speech to text transcription. While there is still a lot of room for improvement, at the basic level it’s solved. The other two is where it gets complicated. When it comes to voice assistants, that means that even if they can’t do everything you want them to do, they at the very least have to understand what it is that you mean and try their best to help you reach your goal some other way. This is where assistants go wrong at the moment. Just stray a tad too far from the predefined functional path and you could just as well be talking jibberish. But even when they understand what you’re saying, keeping a healthy exchange alive, offering assistance and information where you didn’t expect it and asking relevant questions to grasp context is still a huge problem that needs to be solved.

Once a voice interface reaches a certain threshold of both, being useful and leading conversation that feels natural, it has the potential to reach an incredible amount of users. And reach them on a more personal level than it is possible now.

In conversations with robots, even simple ones, humans tend to assign human traits to the machine. We put meaning, feelings, and intentions to words where there are none. This is called the ELIZA effect and scientifically well established, this article by Chatbots Magazine explains it in simple terms. If our assistant acts nicely and reacts benevolently to our requests, we can’t help but put trust in that it’s caring for our best interest. Knowing that it’s a machine, and about the contradiction of these statements, surprisingly doesn’t even diminish our trust.

And here’s the fun part: Combining usefulness with trust leads to heavy use, and more importantly, people opening up and giving more information about themselves and their wants.

2. Ultimate Personalization

“This is all nice and interesting, Pascal, but how will companies actually make money with their robot-in-a-can already?” — you, now.

Well, the exact same way as they are right now. By selling targeted advertisement and website personalization. Just so much better!

Website personalization and targeted advertising are essentially the same thing, just applied in different places. They’re not new ideas, but still all the rage and where the bulk of money is made online. It’s where huge investments are made in better tracking and data analytics technology. Where lots of startups pop up to sell you ever better solutions, where Google and Facebook strive because of their tech and giant userbases.

That’s because it’s really effective. Obviously, showing ads and options that are relevant and of interest to you makes it a lot more likely that you actually click on them.

Trying to sell a subscription for your gossip magazine? I couldn’t care less!
A great deal on that pair of shoes I’m looking for? Now we’re talking!

The thing is, doing personalization is really hard and essentially a guessing game. Whatever you are doing online, every click, scroll, and pause, is likely to be tracked and analyzed. And while that already gives quite a good picture of what your general interests are, what you like and who you communicate with, everything is just a rough estimation. Imagine you’re visiting an online shop, for example, clicking around for a while and then leaving. It’s close to impossible to know if you were just browsing and curious, not aiming to buy anything now anyway, or if you were looking for something specific and were disappointed by the offer, the price, the service or something else about the shop. You can guess what someone is interested in, but you never know their true intentions, and the meta information about it that is important to them.

And that’s where voice assistants come in to play. It is wrong to believe that voice interaction will replace our screen-based toys as we know them. Our smartphones, our tablets, our laptops, our TVs. We’re visual creatures after all. We LOVE seeing things!

Just imagine buying clothes online, or pretty much anything that is not household goods, without ever seeing it. Humans love looking at photographs, watching video — hell, even reading written words! (You’re doing it right now!) Neither is audio suitable for replacing all content consumption nor is voice suited as an input mechanism for every context and situation. Our screens will not go away. But voice interaction will make them better, and even more importantly, bring computing to new contexts and situations throughout our lives.

And the data these voice assistants collect will enhance our on-screen experiences. Even more, they will take targeted advertising and website personalization to a whole new level. While interacting with a screen, your encoding your intent, the goal you’re pursuing, into clicks, taps, swipes, and scrolling. The software then records your code and tries to reconstruct your original intent from the data. A lot of information gets lost here.

With a voice assistant, however, you are directly voicing your intent, telling the machine precisely what you came here to do. It cuts out the encoding in the middle. In the course of the conversation, you will even give a lot more metadata about what is important to you. (“I want these shoes, ideally black or something similar. They can’t cost more than $60. Or maybe $70 if the colorway is great. If the sole is white I’m out, going for a more classy, less sporty look.”)

Every single person I know would not tolerate a personal assistant, and thus a speaker in their home, screaming ads at them randomly or in the middle of a conversation. Advertising on the assistants is not the way to go! But taking this data and applying it to our screen-based interactions is where it gets interesting.

You tell them about yourself and your interests without noticing. The next time you’re online, you’re seeing laser-precise ads with great offers. Every website looks and feels like it’s designed just for you.

Now that is pure gold!

If you like what I write and want to make me smile, simply , comment and/or follow. If you don’t, do it anyway, life is short, do something nice! Either way, thanks a ton for reading! — Peace

--

--

Designer, writer, researcher, engineer — computational product person. Loves art, paints all too rarely. Tries to talk to computers, but they just never listen…