How Does Artificial Life Avoid the Uncanny Valley?

July 6, 2015

The following creepy humanoids provide ample reason to fear artificial intelligence:

Screen Shot 2015-07-05 at 12.59.52 PM

This is just one example of virtual humans that would be appropriate in a horror movie. There are many others. Here’s my question: why are there so many creepy humans in computer animation?

Screen Shot 2015-07-05 at 7.27.08 PMThe uncanny problem is not necessarily due to the AI itself: it’s usually the result of failed attempts at generating appropriate body language for the AI. As I point out in the Gestural Turing Test: “intelligence has a body”. And nothing ruins a good AI more than terrible body language. And yes, when I say “body language”, I include the sound, rhythm, timbre, and prosody of the voice (which is produced in the body).

Simulated body language can steer clear of the uncanny valley with some simple rules of thumb:

1. Don’t simulate humans unless you absolutely have to.

2. Use eye contact between characters. This is not rocket science, folks.

3. Cartoonify. Less visual detail leaves more to the imagination and less that can go wrong.

4. Do the work to make your AI express itself using emotional cues. Don’t be lazy about it.

Shameless plug: Wiglets are super-cartoony non-humanoid critters that avoid the uncanny valley, and use emotional cues, like eye contact, proxemic movements, etc.

ww

These videos show how wiglets move and act.

0313lifeArtificial Life was invented partly as a way to get around a core problem of AI: humans are the most sophisticated and complex animals on Earth. Simulating them in a realistic way is nearly impossible, because we can always detect a fake. Getting it wrong (which is almost always the case) results in something creepy, scary, clumsy, or just plain useless.

In contrast, simulating non-human animals (starting with simple organisms and working up the chain of emergent complexity) is a pragmatic program for scientific research – not to mention developing consumer products, toys, games, and virtual companions.

We’ll get to believable artificial humans some day.

Meanwhile…

I am having a grand old time making virtual animals using simulated physics, genetics, and a touch of AI. No lofty goals here. With a good dose of imagination (people have plenty of it), it only takes a teaspoon of AI (crafted just right) to make a compelling experience – to make something feel and act sentient. And with the right blend of body language, responsiveness, and interactivity, imagination can fill-in all the missing details.

Alan Turing understood the role of the observer, and this is why he chose a behaviorist approach to asking the question: “what is intelligence?”

intelligent-animals-01Artificial Intelligence is founded on the anthropomorphic notion that human minds are the pinnacle of intelligence on Earth. But hubris can sometimes get in the way of progress. Artificial Life – on the other hand, recognizes that intelligence originates from deep within ancient Earth. We are well-advised to understand it (and simulate it) as a way to better understand ourselves, and how we came to be who we are.

It’s also not a bad way to avoid the uncanny valley.


Your Voice is Puppeteering an Avatar in my Brain

November 23, 2014

I have been having a lot of video chat conversations recently with a colleague who is on the opposite side of the continent.

Now that we have been “seeing” each other on a weekly basis, we have become very familiar with each other’s voices, facial expressions, gesticulations, and so on.

But, as is common with any video conferencing system: the audio and video signal is unpredictable. Often the video signal totally freezes up, or it lags behind the voice. It can be really distracting when the facial expressions and mouth movements do not match the sound I’m hearing.

Sometimes we prefer to just turn off the video and stick with voice.

One day after turning off the video, I came to the realization that I have become so familiar with his body language that I can pretty much guess what I would be seeing as he spoke. Basically, I realized that…

HIS VOICE WAS PUPPETEERING HIS VIRTUAL SELF IN MY BRAIN.

Since the voice of my colleague is normally synchronized with his physical gesticulations, facial expressions, and body motions, I can easily imagine the visual counterpart to his voice.

This is not new to video chat. It has been happening for a long time with telephone, when we speak with someone we know intimately.

puppeteer

In fact, it may have even happened at the dawn of our species.

According to gestural theory, physical, visible gesture was once the primary communication modality in our ape ancestors. Then, our ancestors began using their hands increasingly for tool manipulation—and this created evolutionary pressure for vocal sounds to take over as the primary language delivery method. The result is that we humans can walk, use tools, and talk, all at the same time.

As gestures gave way to audible language, our ancestors could keep looking for nuts and berries while their companions were yacking on.

Here’s the point: The entire progression from gesture to voice remains as a vestigial pathway in our brains. And this is why I so easily imagine my friend gesturing at me as I listen to his voice.

Homunculi and Mirror Neurons

There are many complex structures in my brain, including several body maps that represent the positions, movements and sensations within my physical body. There are also mirror neurons – which help me to relate to and sympathize with other people. There are neural structures that cause me to recognize faces, walking gaits, and voices of people I know.

Evolutionary biology and neuroscience research points to the possibility that language may have evolved out of, and in tandem with gestural communication in homo sapiens. Even as audible language was freed from the physicality of gesture, the sound of one’s voice remains naturally associated with the visual, physical energy of the source of that voice (for more on this line of reasoning, check out Terrance Deacon).

puppeteer2Puppeteering is the art of making something come to life, whether with strings (as in a marionette), or with your hand (as in a muppet). The greatest puppeteers know how to make the most expressive movement with the fewest strings.

The same principle applies when I am having a Skype call with my wife. I am so intimately familiar with her voice and the associated visual counterpart, that all it takes is a few puppet strings for her to appear and begin animating in my mind – often triggered by a tiny, scratchy voice in a cell phone.

Enough pattern-recognition material has accumulated in my brain to do most of the work.

I am fascinated with the processes that go on in our brains that allow us to build such useful and reliable inner-representations of each other. And I have wondered if we could use more biomimicry – to apply more of these natural processes towards the goal of transmitting body language and voice across the internet.

These ideas are explored in depth in Voice as Puppeteer.


High Fidelity: Body Language through “Telekinesics”

June 2, 2013

Human communication demonstrates the usual punctuated equilibria of any natural evolutionary system. From hand gestures to grunts to telephones to email and beyond, human communication has not only evolved, but splintered off into many modalities and degrees of asynchrony.

hifi-logoI recently had the great fortune to join a company that is working on the next great surge in human communication: High Fidelity, Inc. This company is bringing together several new technologies to make this happen.

So, what is the newest evolutionary surge in human communication? I would describe it using a term from Virtual Body Language (page 22):

Telekinesics is a word invented to denote…”the study of all emerging nonverbal practices across the internet, by adding the prefix, tele to Birdwhistell’s, term kinesics. It could easily be confused with “telekinesis”: the ability to cause movement at a distance through the mind alone (the words differ by only one letter). But hey, these two phenomena are not so different anyway, so a slip of the tongue wouldn’t be such a bad thing. Telekinesics may be defined as “the science of body language as conducted over remote distances via some medium, including the internet”. 

And now it’s not just science, but practice: body language is ready to go online…in realtime.

And when I say “realtime” – I mean, pretty damn fast, compared to most things that zip (or try to zip) across the internet. And when we’re talking about subtle head nods, changes in eye contact, fluctuations in your voice, and shoulder shrugs, fast is not just a nicety, it is a necessity – for clear communication using a body.

Here’s Ryan Downe showing an early stage of avatar head movement using Google Glass.

Philip Rosedale, the founder of High Fidelity, often talks about how cool it would be for my avatar to walk up to your avatar and give it a little shoulder-shove, or a fist-bump, or an elbow-nudge, or a hug…and for your avatar to respond with a slight – but noticeable – movement.

It would appear that human touch (or at least the visual/audible representation of human touch) is on the verge of becoming a reality – through telekinesics. Of all the modalities and senses that we use to communicate, touch is the most primal: we share it with the oldest microorganisms.

touch_avatarWhen touch is manifest on the internet, along with highly-crafted virtual environments, maybe, just maybe, we will have reached that stage in human evolution when we can have a meaningful, intimate exchange – even if one person is in Shanghai and the other is in Chicago.

small_earthAnd that means people can stop having to fly around the world and burning fossil fuels in order to have 2-hour-long business meetings. And that means reducing our carbon footprint. And that means we might have a better chance of not pissing-off Mother Earth to the degree that she has a spontaneous fever and shrugs us off like pesky fleas.

Which would really suck.

So…keep an eye on what we’re doing at High Fidelity, and get ready for the next evolutionary step in human communication. It just might be necessary for our survival.


On Phone Menus and the Blowing of Gaskets

January 2, 2013

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

This blog post is only tangentially related to avatars and body language. But it does relate to the larger subject of communication technology that fails to accommodate normal human behavior and the rules of natural language.

But first, an appetizer. Check out this video for a phone menu for callers to the Tennessee State Mental Hospital:

http://www.youtube.com/watch?v=zjABiLYrKKE


A Typical Scenario

You’ve probably had this experience. You call a company or service to ask about your bill, or to make a general inquiry. You are dumped into a sea of countless menu options given by a recorded message (I say countless, because you usually don’t know how many options you have to listen to – will it stop at 5? Or will I have to listen to 10?). None of the options apply to you. Or maybe some do. You’re not really sure. You hope – you pray, that you will be given the option to speak to a representative, a living, breathing, thinking, soft and cuddly human. After several agonizing minutes (by now you’ve forgotten most of the long-winded options) you realize that there is no option to speak to a human. Or at least youthink there is no option. You’re not really sure.

Your blood pressure has now reached levels that warrant medical attention. If you still have rational neurons firing, you get the notion to press “0″. And the voice says, “Please wait to speak to a phone representative”. You collapse in relief. The voice continues: “this call may be recorded for quality assurance” Yea, right. (I think I remember once actually hearing the message say, “this call may be recorded……because…we care”. Okay now that is gasket-blowing material).

Why Conversation Matters

I don’t think I need to go into this any further. Just do a search on “phone menu” (or “phone tree”) and “frustration”, or something like that, and follow the scent and you’ll find plenty of blog posts on the subject.

How would I best characterize this problem? I could talk about it from an economic point of view. For instance it costs a company a lot more to hire real people than to hook up an automated answering service or an interactive voice response (IVR) system. But companies have to also weigh the negative impact of a large percentage of irate customers. But too few companies look at this as a Design problem. Ah, there it is again: that ever-present normalizer and humanizer of technology: DesignIt’s invisible when it works well, and that’s why it is such an unsung hero.

The Hyper-Linearity of Non-Interactive Verbal Messages

The nature of this design problem, I believe, is that these phone menus give a large amount of verbal information (words, sentences, options, numbers, etc.) which take time to explain. They are laid out in a sequential order.

There is no way to jump ahead, to interrupt the monolog, or to ask it for clarification, as you would in a normal conversation. You are stuck in time – rigid, linear time, with no escape. (At least that’s what it feels like: there are usually options to hit special keys to go to the previous menu or pop out entirely, etc. But who knows what those keys are? And the dreaded fear of getting disconnected is enough to keep people like me staying within the lines, gritting  teeth, and being obedient (although that means I have the potential to become the McDonald’s gunman who makes the headlines the next morning.)

Compare this with a conversation with a phone representative: normal human dialog involves interruptions, clarifications, repetitions, mirroring (the “mm’s”, “hmm’s”, “ah’s”, “ok’s”, “uh-huh’s”, and such – the audible equivalent of eye-contact and head-nods), and all the affordances that you get from the prosody of speech. Natural conversations continually adapt to the situation. These adaptive, conversational dynamics are absent from the braindead phone robots. And their soft, soothing voices don’t help – in fact they only make me want to kill them that much harder.

There are two solutions:

1. Full-blown Artificial Intelligence, allowing the robot voice to “hear” your concerns, questions, and drill down, with your help, to the crux of the problem. But I’m afraid that AI  has a way to go before this is possible. And even if it is almost possible, the good psychologists, interaction designers, and human-user interface experts don’t seem to be running the show. They are outnumbered by the techno-geeks with low EQ, and little understanding of human psychology. Left-brainers gather the power and influence, and run the machines – computer-wise and business-wise – because they are good with the numbers, and rarely blow a gasket. The right-brained skill set ends up stuck on the periphery, almost by its very nature. I’m waiting for this revolution I keep hearing about – the Revenge of the Right Brain. So far, I still keep hitting brick walls built with left-brained mortar. But I digress.

2. Visual interfaces. By having all the options laid out in a visual space, the user’s eyes can jump around (much more quickly than a robot can utter the options). Thus, if the layout is designed well (a rarity in the internet junkyard) the user can quickly see, “ah, I have five options. Maybe I want to choose option 4 – I will select, “more information about option 4 to make sure”. All of this can happen within a matter of seconds. You could almost say that the interface affords a kind of body language that the user reads and acts upon immediately.

Consider the illustration below for a company’s phone tree which I found on the internet (I blacked-out the company name and phone number). Wouldn’t it be nice if you could just take a glance at this visual diagram and jump to the choice you want? If you’re like me, your eyes will jump straight to the bottom where the choice to speak to a representative is. (Of course it’s at the bottom).

This picture says it all. But of course. We each have two eyes, each with millions of photoreceptors: simultaneity, parallelism, instant grok. But since I’m talking about telephones, the solution has to be found within the modality of audio alone, trapped in time. And in that case, there is no other solution than an advanced AI program that can understand your question, read your prosodic body language, and respond to the flow of the conversation, thus collapsing time.

…and since that’s not coming for a while, there’s another choice: a meat puppet – one of those very expensive communication units that burn calories, and require a salary. What a nuisance.


Uncanny Charlie

August 18, 2012

The subway system in Boston has a mascot named “Charlie”, a cartoon character who rides the train and reminds people to use the “Charlie Card”. With the exception of his face, he looks like a normal airbrushed graphic of a guy with a hat. But his face? Uh, it’s f’d up.

In case you don’t know yet about the Uncanny Valley, it refers to a graph devised by a Japanese robot maker. The graph shows typical reactions to human likeness in robots and other simulations. The more realistic the robot (or computer generated character) the more CREEPY it becomes….

..until it is so utterly realistic that you are fooled, and you respond to it as if it were a living human. But watch out. If the eyes do something wacky or scary, or if something else reveals the fact that it is just an animated corpse…DOWN you fall…. into the valley.

Anyway, I have a theory about the uncanny valley: it is just a specific example of a more general phenomenon that occurs when incompatible levels of realism are juxtaposed in a single viewing experience. So for instance, an animated film in which the character motions are realistic – but their faces are abstract – can be creepy. How about a computer animation in which the rendering is super-realistic, but the motions are stiff and artificial? Creepola. A cartoon character where one aspect is stylized and other aspects are realistic looks…not right. That’s Charlie’s issue.

Stylized faces are everywhere:

But when an artist takes a stylized line-drawn graphic of a face and renders it with shading, I consider this to be a visual language blunder. The exception to this rule of thumb is demonstrated by artists who purposefully juxtapose styles and levels of realism, for artistic impact, such as the post-modern painter David Salle.

The subject of levels of realism and accessibility in graphic design is covered in McCloud’s Understanding Comics. The image-reading eyebrain can adjust its zone of suspension of disbelief to accommodate a particular level of stylism/realism. But in general, it cannot easily handle having that zone bifurcated.

Charlie either needs a face transplant to match his jacket and hat, or else he needs to start wearing f’d-up clothes to match his f’d-up face.