The Only Thing Weirder Than a Telemarketing Robot

The Turk Chess Automaton (Bibliodyssey)

Sometimes, you have to think like a scammer.

So, when I saw that an apparent robot telemarketer named Samantha West had randomly called a Time writer and denied she/it/they was a robot, I wondered: where could I buy such an interactive voicebot? 

This query led me down a strange rabbit hole. And along the way, I discovered that Samantha West may be something even stranger than a telemarketing robot. Samantha West may be a human sitting in a foreign call center playing recorded North American English through a soundboard.

I know. It's weird. But let me explain.

First, let's hear "Samantha": 

Clearly, this is not human conversation: there are repeated laughs and weird phrases. "She failed several other [humanity] tests," Time wrote. "When asked 'What vegetable is found in tomato soup?' she said she did not understand the question. When asked multiple times what day of the week it was yesterday, she complained repeatedly of a bad connection."

It seems so open and shut.

So, Time's story ran with the plausible headline, "Meet the Robot Telemarketer Who Denies She's a Robot." And many other blogs went with that explanation, too.

But if this kind of robotic telemarketing is possible, why don't we see it more often? Every other kind of spam, if it is technically possible, becomes pervasive.

* **

The first step to acquiring a voicebot like this was to figure out what the people selling it might call it. Certainly they would not refer to their services as "robot telemarketing."

I started looking for the right jargon to Google. As it turns out, there are two key phrases: "interactive voice response" and "outbound." Interactive voice response refers to telephone systems that can process what you're saying and respond appropriately (even intelligently at times). Outbound call centers make calls; the inverse, inbound, refers to systems that receive calls from customers.

So, put them together and you have, "Outbound IVR," which Datamonitor projected should be a half billion dollar market by now.

Outbound IVR, though, is not generally supposed to be used for telemarketing. It's supposed to be used to deliver automated messages and provide just a smidgen of interactivity. So, a common use case might be to call a debtor up and ask them to pay a bill. Then the machine can take that payment without transferring you to a human. Or automated scheduling: a doctor's office could confirm that a patient has an appointment with the voicebot. 

Why isn't outbound IVR used for telemarketing?

Well, primarily because IVR is really, really hard. It is widely recognized that voice-to-text with your phone (i.e. Siri) is far from reliable. And Siri actually has a lot better data to work with. An IVR bot has to work with the low-quality audio that's transmitted through the public switched telephone network (PSTN). Quality, in this case, being a quantitative measure of how much data is in the audio. 

That's why the voice recognition on company telephone systems is a target for mockery. ("I said three. No, no, I said THREE. THREE!") And when someone is calling into a company, the company severely restricts the scenarios that the IVR bot has to work within. The bot knows what it's listening for. And it's still just OK.

Now, Samantha West actually uses a bunch of different responses as it tries to pose as a general-purpose salesperson. The queries that the editors launch at Samantha are pretty complex, and yet she comes back with an appropriate (if limited) response. 

When I contacted outbound marketing companies and showed them the story with the clips, they all said they don't or can't do this sort of interactive voice response.

One source, who agreed to explain the problem on the condition that they would not be associated with this marketing bot, gave a fascinating explanation of why the telemarketing robot probably was not possible

Getting this to work so quickly would be very difficult to achieve automatically as the audio on PSTN calls is 8000 Hz mono.  For reference that is one less channel and 120,000 less hertz than the low quality mp3s in your music collection.  This is why voice recognition is so aggravating over the phone - there is very little signal upon which to perform feature recognition.  Even the fastest in the business (Nuance) doesn't respond this quickly.

Even provided you could do the recognition under 50 milliseconds, the answers the gentleman is giving on the call are very fuzzy.  These aren't boolean "yes" and "no" - they are simple and complex sentences wildly divergent from the prompt of the robot.  So some [natural language processing] would need to be performed on the human's response to translate what was being said then fuzzy match the result against what an appropriate response would be.
 
Doing all that in a delay for natural conversation doesn't sound possible to me.  The only product that might have a shot at it are Nuance, but even then I don't think they are fast enough.
Other sources also suggested that Nuance might be the only company whose technology could do it. But when I contacted Nuance, a representative told me that they were not involved directly in the design of the software, nor did they know of anyone who was doing such a thing. (They did admit that there are people who could have gotten their hands on the software through resellers, but to their knowledge, this had not happened.)
 
So, if it's not a robot, what gives then? Because clearly someone is giving canned responses. 
 
The theory I heard — and keep in mind it is just a hypothesis to explain a perplexing situation — goes like this:
 
Samantha West is a human being who understands English but who is responding with a soundboard of different pre-recorded messages. So a human parses the English being spoken and plays a message from Samantha West. It is IVR, but the semantic intelligence is being provided by a human. You could call it a cyborg system. Or perhaps an automaton in that 18th-century sense

If you're reading this, you must be wondering: WHY?!?!

Well, while Americans accept customer service and technical help from people with non-American accents, they do not take well to telemarketing calls from non-Americans. The response rates for outbound marketing via call center are apparently abysmal. 

So, Samantha West, could be the rather strange solution to this set of circumstances and technical capabilities. Perhaps a salesperson like this doesn't have to say too many things to figure out if someone might be interested in buying insurance.

I tried to contact the company who Samantha West was working for. They hung up on me after I said I was an editor with The Atlantic

Alexis C. Madrigal

Alexis Madrigal is the deputy editor of TheAtlantic.com, where he also oversees the Technology Channel. He's the author of Powering the Dream: The History and Promise of Green Technology. More

The New York Observer has called Madrigal "for all intents and purposes, the perfect modern reporter." He co-founded Longshot magazine, a high-speed media experiment that garnered attention from The New York Times, The Wall Street Journal, and the BBC. While at Wired.com, he built Wired Science into one of the most popular blogs in the world. The site was nominated for best magazine blog by the MPA and best science Web site in the 2009 Webby Awards. He also co-founded Haiti ReWired, a groundbreaking community dedicated to the discussion of technology, infrastructure, and the future of Haiti.

He's spoken at Stanford, CalTech, Berkeley, SXSW, E3, and the National Renewable Energy Laboratory, and his writing was anthologized in Best Technology Writing 2010 (Yale University Press).

Madrigal is a visiting scholar at the University of California at Berkeley's Office for the History of Science and Technology. Born in Mexico City, he grew up in the exurbs north of Portland, Oregon, and now lives in Oakland.


Elsewhere on the web

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus