The Google Duplex Demo -- What Could Possibly Go Wrong?
The recent demonstration of Google Duplex by CEO Sundar Pichai has grabbed the world's attention. The demo shows a pre-recorded, and supposedly realtime, interaction between an experimental Google voice assistant, called Google Duplex, interacting with a receptionist at a hair salon and having a back-and-forth conversation to book an appointment. Other demos linked on the Google Duplex blog page show interactions with restaurants and other B2C service providers. What drew gasps from the crowd was how "realistic" the bot seemed, even making human-like pauses and speech disfluencies such as "umms" and an occasional "mm-hmm" to let the other party know that the not-quite-human bot was processing what was being said.
The world is simultaneously amazed with how well the transaction seemed to have happened between a voice bot and a human, and dismayed that bots are being unleashed on the world to pretend to be humans acting on behalf of other humans. The implications of both sides of this are significant. Or is it really that significant?
It's Just a Voice Assistant Used in a Limited Conversational Context. So What?!
What Google demoed was just slightly more incremental technology than what you may have experienced when using any of the latest iteration of voice assistants. If you've played a round of jeopardy with Alexa, or chatted with Google Home, Siri, or Microsoft Cortana then you know that these devices are fully capable of limited voice interactions with humans. Of course, the moment you stray too far outside what these assistants can handle, you quickly realize you're not talking to a person. These limited demos from Google only show perhaps the most successful of the interactions. It's hard to say how this performs in a much larger scale because we can't see how the other "experiments" performed. It's telling that this is not yet released as a product, but is being shown as an experiment.
That being said, this experiment is illustrating technology and issues that are not yet part of our daily voice assistant experience. The most significant difference is the intentional use of disfluencies to "fool" the user. Siri, Alexa, and Cortana are not trying to fool you. In fact, if you dig even a bit deeper with your questions, they are happy to disclose that they are but simple machines answering your not-always-simple questions. Yet that's not how it went with the Google Duplex demos. The bot was intentionally using human mannerisms to fool the human user so that the human would assume they're not talking to a bot. Why? We can only assume that Google wanted to see what would happen if a human didn't know they were talking to a bot. Perhaps if they were, then the scripted conversation would go differently. But would it have been any worse?
The second, and perhaps even larger difference between the typical voice assistant interaction and this latest demo is that the voice assistant wasn't responding to a trigger word to accomplish the task. Instead, a user had a chat with the assistant and instructed it to make a call. The chatbot than basically said "ok, I'll make the call, and let you know when it's done." The voice assistant then somehow knew instantly that it needs to call a phone number, talk to a human, schedule a hair appointment, and know all the intricacies involved in what makes scheduling a hair appointment different from ordering a pizza.
This brings up a serious point: a human might have triggered the conversation to happen, but the human who triggered it was no where involved in the interaction. Think about it. This was a bot that was acting on behalf of a human who was not present on the conversation. The assistant might have been initially triggered to perform the action, but then it's going out into the real world, acting on your behalf, pretending to be a human, and talking to real people and making real world things happen. Let me emphasize that. It's a bot. Acting on your behalf. Pretending to be a human. Making real world things happen. What could possibly go wrong? A LOT.
The Ethics of Bots Pretending to be Humans
Before we get into the issues of bots pretending to be humans from a disclosure and ethics perspective (something that all the chatterheads are talking about right now), let's talk about bots pretending to be humans acting on the behalf of real humans. What if I get my bot to call the police and report a fake crime? What if you use the bot to call someone who has a restraining order on you? What if you use the bot to hoax or prank or otherwise cause mischief? What if there's no intentional mischief involved, but your bot uses bad language or makes a wrong comment? What if your bot pretends to be you and schedules you for something you don't want? What if the human on the other end of the phone pranks your bot and makes your bot do something you don't want?
Let's also think about bots like Google Duplex in the hands of malicious users. Could someone who wants to put a rival out of business simply use a bot to schedule a bunch of appointments with different fictitious people who then simply no show? Could you use a bot to schedule reservations at all the top dining spots on Valentine's Day and then later sell these off to desperate people looking for a seat? Will criminals use bots in combination with all the personal data that is already out there to call unsuspecting people and pose as representatives of people they trust, using information they have, to scam them out of money or personal safety? Criminals could automate social engineering attacks on a mass scale with bots posing as humans and acting on behalf of other humans.
Already we can't believe what we read, see, or hear. This is going to introduce a new threat vector into our cybersecurity lexicon, and make us not believe anything or anyone that calls us via phone. Phone spam has reached an all time high and we already have issues with humans talking to bots in certain business situations. And in this future, doesn't something like Google Duplex become useless? To prevent that, we need, not just for the sake of ethics, but for the sake of efficacy a way to verify that we're talking to a bot or a human on the phone. We also need a way to gauge the true intent of the bot and what they are truly seeking to accomplish if acting on behalf of another human.
The common conversation right now around voice bot ethics is that people want truth and acknowledgement of who they are speaking to. They don't want to be lied to. They don't want to be fooled. They want to know who they are talking to. If they're talking to a bot that's pretending to be a human and find out, they would be more upset than knowing they're talking to a bot. People are bemoaning the ethics of Google Duplex because it's trying to be intentionally misleading. It's not the technology they are bemoaning. Voice assistants that can have real conversations can be particularly helpful. But if they are pretending to be something they are not, they could be harmful. More importantly for the AI industry, maybe this means that the Turing Test is a red herring. It might be time to retire the Turing Test as a test of intelligence. What? Read on.
Did Google Duplex Pass the Turing Test?
The Turing Test is meant to be a way to determine the intelligence of an artificial system by using conversation as a mode of gauging intelligence. The thinking from Alan Turing went that if a human had no idea that it was having a conversation with a machine or another human, then the machine must be intelligent. In this light, did the Google Duplex demo perhaps as a side effect pass the Turing Test and thus be shown to be intelligent? For sure, the human on the other end of the phone didn't know they were talking to a machine or a human, so from the simple judgement of the Turing Test, it passed. Yet, AI researchers aren't jumping up and down celebrating this victory for the industry. Why? For a few reasons.
First, the conversation was very narrow in scope and limited to a specific transaction. If the human on the other end of the phone decided to veer off script and have a conversation about the weather or personal thoughts on Taylor Swift's recent album, how would the bot have responded? It might have basically barfed on the phone and shown the truth that it was a bot. The ruse would have been up and the Turing Test failed. Perhaps this is also a good way to test to see if your phone companion is human or a bot. Veer off script. Any bot that can successfully veer off script and have random conversations on random topics like a human would should indeed pass the Turing Test. But that's not what happened here.
More to the point on the Turing Test, perhaps using conversation as a way to gauge overall system intelligence is not such a great idea. As discussed above, ethicists and people across the board are complaining loudly about bots fooling humans. So if people don't want to be fooled that they're talking to a bot, then what is the point of the Turing Test? The Turing Test is all about fooling people as a gauge of intelligence. If intelligence is all about fooling and games, then maybe we're defining intelligence incorrectly. Intelligence shouldn't be measured as to how well a machine performs a trick to fool a human. If that's how intelligence is measured, then we're all just performing tricks on each other that our machine minds have perfected over the millennia. Philosophers might actually agree with that point. But we're getting off topic. The real issue is the Turing Test. It's an intelligence test based on the concept that a bot should a fool a human that it's not a bot. It seems from the response of society that's not the sort of interaction we would prefer. In this light, whether Google Duplex can eventually pass the Turing Test is a moot point. We would rather it not.