How Does Artificial Life Avoid the Uncanny Valley?

July 6, 2015

The following creepy humanoids provide ample reason to fear artificial intelligence:

Screen Shot 2015-07-05 at 12.59.52 PM

This is just one example of virtual humans that would be appropriate in a horror movie. There are many others. Here’s my question: why are there so many creepy humans in computer animation?

Screen Shot 2015-07-05 at 7.27.08 PMThe uncanny problem is not necessarily due to the AI itself: it’s usually the result of failed attempts at generating appropriate body language for the AI. As I point out in the Gestural Turing Test: “intelligence has a body”. And nothing ruins a good AI more than terrible body language. And yes, when I say “body language”, I include the sound, rhythm, timbre, and prosody of the voice (which is produced in the body).

Simulated body language can steer clear of the uncanny valley with some simple rules of thumb:

1. Don’t simulate humans unless you absolutely have to.

2. Use eye contact between characters. This is not rocket science, folks.

3. Cartoonify. Less visual detail leaves more to the imagination and less that can go wrong.

4. Do the work to make your AI express itself using emotional cues. Don’t be lazy about it.

Shameless plug: Wiglets are super-cartoony non-humanoid critters that avoid the uncanny valley, and use emotional cues, like eye contact, proxemic movements, etc.

ww

These videos show how wiglets move and act.

0313lifeArtificial Life was invented partly as a way to get around a core problem of AI: humans are the most sophisticated and complex animals on Earth. Simulating them in a realistic way is nearly impossible, because we can always detect a fake. Getting it wrong (which is almost always the case) results in something creepy, scary, clumsy, or just plain useless.

In contrast, simulating non-human animals (starting with simple organisms and working up the chain of emergent complexity) is a pragmatic program for scientific research – not to mention developing consumer products, toys, games, and virtual companions.

We’ll get to believable artificial humans some day.

Meanwhile…

I am having a grand old time making virtual animals using simulated physics, genetics, and a touch of AI. No lofty goals here. With a good dose of imagination (people have plenty of it), it only takes a teaspoon of AI (crafted just right) to make a compelling experience – to make something feel and act sentient. And with the right blend of body language, responsiveness, and interactivity, imagination can fill-in all the missing details.

Alan Turing understood the role of the observer, and this is why he chose a behaviorist approach to asking the question: “what is intelligence?”

intelligent-animals-01Artificial Intelligence is founded on the anthropomorphic notion that human minds are the pinnacle of intelligence on Earth. But hubris can sometimes get in the way of progress. Artificial Life – on the other hand, recognizes that intelligence originates from deep within ancient Earth. We are well-advised to understand it (and simulate it) as a way to better understand ourselves, and how we came to be who we are.

It’s also not a bad way to avoid the uncanny valley.


Your Voice is Puppeteering an Avatar in my Brain

November 23, 2014

I have been having a lot of video chat conversations recently with a colleague who is on the opposite side of the continent.

Now that we have been “seeing” each other on a weekly basis, we have become very familiar with each other’s voices, facial expressions, gesticulations, and so on.

But, as is common with any video conferencing system: the audio and video signal is unpredictable. Often the video signal totally freezes up, or it lags behind the voice. It can be really distracting when the facial expressions and mouth movements do not match the sound I’m hearing.

Sometimes we prefer to just turn off the video and stick with voice.

One day after turning off the video, I came to the realization that I have become so familiar with his body language that I can pretty much guess what I would be seeing as he spoke. Basically, I realized that…

HIS VOICE WAS PUPPETEERING HIS VIRTUAL SELF IN MY BRAIN.

Since the voice of my colleague is normally synchronized with his physical gesticulations, facial expressions, and body motions, I can easily imagine the visual counterpart to his voice.

This is not new to video chat. It has been happening for a long time with telephone, when we speak with someone we know intimately.

puppeteer

In fact, it may have even happened at the dawn of our species.

According to gestural theory, physical, visible gesture was once the primary communication modality in our ape ancestors. Then, our ancestors began using their hands increasingly for tool manipulation—and this created evolutionary pressure for vocal sounds to take over as the primary language delivery method. The result is that we humans can walk, use tools, and talk, all at the same time.

As gestures gave way to audible language, our ancestors could keep looking for nuts and berries while their companions were yacking on.

Here’s the point: The entire progression from gesture to voice remains as a vestigial pathway in our brains. And this is why I so easily imagine my friend gesturing at me as I listen to his voice.

Homunculi and Mirror Neurons

There are many complex structures in my brain, including several body maps that represent the positions, movements and sensations within my physical body. There are also mirror neurons – which help me to relate to and sympathize with other people. There are neural structures that cause me to recognize faces, walking gaits, and voices of people I know.

Evolutionary biology and neuroscience research points to the possibility that language may have evolved out of, and in tandem with gestural communication in homo sapiens. Even as audible language was freed from the physicality of gesture, the sound of one’s voice remains naturally associated with the visual, physical energy of the source of that voice (for more on this line of reasoning, check out Terrance Deacon).

puppeteer2Puppeteering is the art of making something come to life, whether with strings (as in a marionette), or with your hand (as in a muppet). The greatest puppeteers know how to make the most expressive movement with the fewest strings.

The same principle applies when I am having a Skype call with my wife. I am so intimately familiar with her voice and the associated visual counterpart, that all it takes is a few puppet strings for her to appear and begin animating in my mind – often triggered by a tiny, scratchy voice in a cell phone.

Enough pattern-recognition material has accumulated in my brain to do most of the work.

I am fascinated with the processes that go on in our brains that allow us to build such useful and reliable inner-representations of each other. And I have wondered if we could use more biomimicry – to apply more of these natural processes towards the goal of transmitting body language and voice across the internet.

These ideas are explored in depth in Voice as Puppeteer.


High Fidelity: Body Language through “Telekinesics”

June 2, 2013

Human communication demonstrates the usual punctuated equilibria of any natural evolutionary system. From hand gestures to grunts to telephones to email and beyond, human communication has not only evolved, but splintered off into many modalities and degrees of asynchrony.

hifi-logoI recently had the great fortune to join a company that is working on the next great surge in human communication: High Fidelity, Inc. This company is bringing together several new technologies to make this happen.

So, what is the newest evolutionary surge in human communication? I would describe it using a term from Virtual Body Language (page 22):

Telekinesics is a word invented to denote…”the study of all emerging nonverbal practices across the internet, by adding the prefix, tele to Birdwhistell’s, term kinesics. It could easily be confused with “telekinesis”: the ability to cause movement at a distance through the mind alone (the words differ by only one letter). But hey, these two phenomena are not so different anyway, so a slip of the tongue wouldn’t be such a bad thing. Telekinesics may be defined as “the science of body language as conducted over remote distances via some medium, including the internet”. 

And now it’s not just science, but practice: body language is ready to go online…in realtime.

And when I say “realtime” – I mean, pretty damn fast, compared to most things that zip (or try to zip) across the internet. And when we’re talking about subtle head nods, changes in eye contact, fluctuations in your voice, and shoulder shrugs, fast is not just a nicety, it is a necessity – for clear communication using a body.

Here’s Ryan Downe showing an early stage of avatar head movement using Google Glass.

Philip Rosedale, the founder of High Fidelity, often talks about how cool it would be for my avatar to walk up to your avatar and give it a little shoulder-shove, or a fist-bump, or an elbow-nudge, or a hug…and for your avatar to respond with a slight – but noticeable – movement.

It would appear that human touch (or at least the visual/audible representation of human touch) is on the verge of becoming a reality – through telekinesics. Of all the modalities and senses that we use to communicate, touch is the most primal: we share it with the oldest microorganisms.

touch_avatarWhen touch is manifest on the internet, along with highly-crafted virtual environments, maybe, just maybe, we will have reached that stage in human evolution when we can have a meaningful, intimate exchange – even if one person is in Shanghai and the other is in Chicago.

small_earthAnd that means people can stop having to fly around the world and burning fossil fuels in order to have 2-hour-long business meetings. And that means reducing our carbon footprint. And that means we might have a better chance of not pissing-off Mother Earth to the degree that she has a spontaneous fever and shrugs us off like pesky fleas.

Which would really suck.

So…keep an eye on what we’re doing at High Fidelity, and get ready for the next evolutionary step in human communication. It just might be necessary for our survival.


On Phone Menus and the Blowing of Gaskets

January 2, 2013

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

This blog post is only tangentially related to avatars and body language. But it does relate to the larger subject of communication technology that fails to accommodate normal human behavior and the rules of natural language.

But first, an appetizer. Check out this video for a phone menu for callers to the Tennessee State Mental Hospital:

http://www.youtube.com/watch?v=zjABiLYrKKE


A Typical Scenario

You’ve probably had this experience. You call a company or service to ask about your bill, or to make a general inquiry. You are dumped into a sea of countless menu options given by a recorded message (I say countless, because you usually don’t know how many options you have to listen to – will it stop at 5? Or will I have to listen to 10?). None of the options apply to you. Or maybe some do. You’re not really sure. You hope – you pray, that you will be given the option to speak to a representative, a living, breathing, thinking, soft and cuddly human. After several agonizing minutes (by now you’ve forgotten most of the long-winded options) you realize that there is no option to speak to a human. Or at least youthink there is no option. You’re not really sure.

Your blood pressure has now reached levels that warrant medical attention. If you still have rational neurons firing, you get the notion to press “0″. And the voice says, “Please wait to speak to a phone representative”. You collapse in relief. The voice continues: “this call may be recorded for quality assurance” Yea, right. (I think I remember once actually hearing the message say, “this call may be recorded……because…we care”. Okay now that is gasket-blowing material).

Why Conversation Matters

I don’t think I need to go into this any further. Just do a search on “phone menu” (or “phone tree”) and “frustration”, or something like that, and follow the scent and you’ll find plenty of blog posts on the subject.

How would I best characterize this problem? I could talk about it from an economic point of view. For instance it costs a company a lot more to hire real people than to hook up an automated answering service or an interactive voice response (IVR) system. But companies have to also weigh the negative impact of a large percentage of irate customers. But too few companies look at this as a Design problem. Ah, there it is again: that ever-present normalizer and humanizer of technology: DesignIt’s invisible when it works well, and that’s why it is such an unsung hero.

The Hyper-Linearity of Non-Interactive Verbal Messages

The nature of this design problem, I believe, is that these phone menus give a large amount of verbal information (words, sentences, options, numbers, etc.) which take time to explain. They are laid out in a sequential order.

There is no way to jump ahead, to interrupt the monolog, or to ask it for clarification, as you would in a normal conversation. You are stuck in time – rigid, linear time, with no escape. (At least that’s what it feels like: there are usually options to hit special keys to go to the previous menu or pop out entirely, etc. But who knows what those keys are? And the dreaded fear of getting disconnected is enough to keep people like me staying within the lines, gritting  teeth, and being obedient (although that means I have the potential to become the McDonald’s gunman who makes the headlines the next morning.)

Compare this with a conversation with a phone representative: normal human dialog involves interruptions, clarifications, repetitions, mirroring (the “mm’s”, “hmm’s”, “ah’s”, “ok’s”, “uh-huh’s”, and such – the audible equivalent of eye-contact and head-nods), and all the affordances that you get from the prosody of speech. Natural conversations continually adapt to the situation. These adaptive, conversational dynamics are absent from the braindead phone robots. And their soft, soothing voices don’t help – in fact they only make me want to kill them that much harder.

There are two solutions:

1. Full-blown Artificial Intelligence, allowing the robot voice to “hear” your concerns, questions, and drill down, with your help, to the crux of the problem. But I’m afraid that AI  has a way to go before this is possible. And even if it is almost possible, the good psychologists, interaction designers, and human-user interface experts don’t seem to be running the show. They are outnumbered by the techno-geeks with low EQ, and little understanding of human psychology. Left-brainers gather the power and influence, and run the machines – computer-wise and business-wise – because they are good with the numbers, and rarely blow a gasket. The right-brained skill set ends up stuck on the periphery, almost by its very nature. I’m waiting for this revolution I keep hearing about – the Revenge of the Right Brain. So far, I still keep hitting brick walls built with left-brained mortar. But I digress.

2. Visual interfaces. By having all the options laid out in a visual space, the user’s eyes can jump around (much more quickly than a robot can utter the options). Thus, if the layout is designed well (a rarity in the internet junkyard) the user can quickly see, “ah, I have five options. Maybe I want to choose option 4 – I will select, “more information about option 4 to make sure”. All of this can happen within a matter of seconds. You could almost say that the interface affords a kind of body language that the user reads and acts upon immediately.

Consider the illustration below for a company’s phone tree which I found on the internet (I blacked-out the company name and phone number). Wouldn’t it be nice if you could just take a glance at this visual diagram and jump to the choice you want? If you’re like me, your eyes will jump straight to the bottom where the choice to speak to a representative is. (Of course it’s at the bottom).

This picture says it all. But of course. We each have two eyes, each with millions of photoreceptors: simultaneity, parallelism, instant grok. But since I’m talking about telephones, the solution has to be found within the modality of audio alone, trapped in time. And in that case, there is no other solution than an advanced AI program that can understand your question, read your prosodic body language, and respond to the flow of the conversation, thus collapsing time.

…and since that’s not coming for a while, there’s another choice: a meat puppet – one of those very expensive communication units that burn calories, and require a salary. What a nuisance.


Just Because It’s Visual Doesn’t Mean It’s Better

May 24, 2012

I’ve been renting a lot of cars lately because my own car died. And so I get to see a lot of the interiors of American cars. Car design is generally more user-friendly than computer interfaces – for the simple reason that when you make a mistake on a computer interface and the computer crashes, you will not die.

As cars become increasingly computerized, the “body language” starts to get wonky, even in aspects that are purely mechanical.

In a car I recently rented, I was looking for the emergency brake. The body language of most of the cars I’ve used offers an emergency brake just to the right of my seat in the form of a lever that I pull up. Body language between human bodies is mostly unconscious. If a human-manufactured tool is designed well, its body langage is also mostly-unconscious: it is natural. Anyway…I could not find an emergency brake in the usual place in this particular car. So I looked in the next logical place: near the floor to the left of the foot pedals. There I saw the following THING:

I wanted to check to make sure it was the brake, so that I wouldn’t inadvertently pop open the hood or the cap of the gas tank. So I peered more closely at the symbol on this particular THING, and I asked myself the following question:

What the F?

Once I realized that this was indeed the emergency brake, I decided that a simple word would have sufficed.

In some cars, the “required action” is written on the brake:


Illiterate Icon Artists

I was reminded of an episode in one of the companies I was working for, where an “icon artist” was hired to build the visual symbols for several buttons on a computer interface. He had devised a series of icons that were meant to provide visual language counterparts to basic actions that we typically do on computer interfaces. He came up with novel and aesthetic symbols. But….UN-READABLE.

I suggested he just put the words on the icons, because the majority of computer users know English, and if they don’t know English, they could always open up a dictionary. Basically, this guy’s clever icons had no counterpart to the rest of the world. They were his own invention – they were UNDISCOVERABLE.

Moral of the story:

Designed body language should corresponds to “natural affordances”;  the expectations and readability of the natural world. If that is not possible, use historical conventions (by now there is plenty of reference material on visual symbols, and I would suspect that by now there are ways to check for the relative “universality” of certain symbols).

In both cases, whether using words or visuals, literacy is needed.

Put in another way:

It is impossible to invent a visual langage from scratch. Because the only one who can visually “read” it is the creator. If it does not commute, it is not language. This applies to visual icons as much as it does to words.

As technology becomes more and more computerized (like cars) we have less and less opportunity to take advantage of natural affordances. Eventually, it will be possible to set the emergency brake by touching a tiny red button, or by uttering a message into a microphone. Thankfully, emergency brakes are still very physical, and I get to FEEL the pressure of that brake as I push it in, or pop it off….

that is…if I can ever find the damn thing.


Voice as Puppeteer

May 5, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————–

According to Gestural Theory, verbal language emerged from the primal energy of the body, from physical and vocal gestures.

url

The human mind is at home in a world of abstract symbols – a virtual world separated from the gestural origins of those symbols. An evolution from the analog to the digital continues today with the flood of the internet over earth’s geocortex. Our thoughts are awash in the alphabet: a digital artifact that arose from a gestural past. It’s hard to imagine that the mind could have created the concepts of Self, God, Logic, and Math: belief structures so deep in our wiring – generated over millions of years of genetic, cultural, and neural evolution. I’m not even sure if I fully believe that these structures are non-eternal and human-fabricated. Since the Copernican Revolution yanked humans out from the center of the universe, it continues to progressively kick down the pedestals of hubris. But, being humans, we cannot stop this trajectory of virtuality, even as we become more aware of it as such.

I’ve observed something about the birth of online virtual worlds, and the foundational technologies involved. One of the earliest online virtual worlds was Onlive Traveler, which used realtime voice.

onlive1

My colleague, Steve DiPaola invented some techniques for Traveler which cause the voice to animate the floating faces that served as avatars.

But as online virtual worlds started to proliferate, they incorporated the technology of chat rooms – textual conversations. One quirky side-effect of this was the collision of computergraphical humanoid 3D models with text-chat. These are strange bedfellows indeed – occupying vastly different cognitive dimensions.

chat_avatars

Many of us worked our craft to make these bedfellows not so strange, such as the techniques that I invented with Chuck Clanton at There.com, called Avatar Centric Communication.

Later, voice was introduced to There.com. I invented a technique for There.com voice chat, and later re-implemented a variation for Second Life, for voice-triggered gesticulation.

Imagine the uncanny valley of hearing real voices coming from avatars with no associated animation. When I first witnessed this in a demo, the avatars came across as propped-up corpses with telephone speakers attached to their heads. Being so tuned-in to body language as I am, I got up on the gesticulation soap box and started a campaign to add voice-triggered animation. As an added visual aid, I created the sound wave animation that appears above avatar heads for both There and SL…

waves

Gesticulation is the physical-visual counterpart to vocal energy – we gesticulate when we speak – moving our eyebrows, head, hands, etc. – and it’s almost entirely unconscious. Since humans are so verbally-oriented, and since we expect our bodies to produce natural body language to correspond to our spoken communications, we should expect the same of our avatars. This is the rationale for avatar gesticulation.

I think that a new form of puppeteering is on the horizon. It will use the voice. And it won’t just take sound signal amplitudes as input, as I did with voice-triggered gesticulation. It will parse the actual words and generate gestural emblems as well as gesticulations. And just as we will be able to layer filters onto our voices to mask our identities or role-play as certain characters, we will also be able to filter our body language to mimic the physical idiolects of Egyptians, Native Americans, Sicilians, four-year-old Chinese girls, and 90-year old Ethiopian men.

Digital-alphabetic-technological humanity reaches down to the gestural underbelly and invokes the primal energy of communication. It’s a reversal of the gesture-to-words vector of Gestural Theory.

And it’s the only choice we have for transmitting natural language over the geocortex, because we are sitting on top of a thousands-year-old heap of alphabetic evolution.


Can You Trust Email Body Language?

December 2, 2011

Steve Tobak wrote an article in CBSNews.com called, How to Read Virtual Body Language in Email.

Steve makes some interesting observations. But, like so many attempts at teaching us “how to read” body language, Steve makes several assumptions that miss the highly contextual, and highly tenuous nature of interpreting emotion via email.

In fact, email is often used by people as a way to avoid emotion or intimacy. It’s an example of asynchronous communication: an email message could take an arbitrary amount of time to compose, and it could be sent at an arbitrary time after writing it. Thus, email is not a reliable medium for reading one’s emotions. It’s hard to lie with your body. It’s much easier to lie with a virtual body. With email, you don’t even have a body.

Damn That Send Button

Actually, I wish I could say that I have always used email in a premeditative, calculated way. I have been guilty of sending email messages in the heat of an emotional moment. A few too many of those emails have lead me to believe that the SEND button should be kept in a locked box in a governmental facility. And the box should have a big sign that says, Are You Sure?

People often make the mistake of assuming that a given communication medium provides a transparent channel for human expression. Oddly enough: email can bring out certain negative qualities in people who may not be negative in normal face-to-face encounters.

People don’t take into account the McLuhan effect, and assume the message is determined only by the communicators. Steve says this about flame mail:

“You probably don’t need me to tell you this, but when you receive what we affectionately call flame mail – where someone lets loose on you in a big, ugly way – that’s aggressive behavior. In other words, they’re acting out like a child throwing a temper tantrum and it’s not about you, it’s about them. I know it’s tempting to think it’s just a misunderstanding, but ask yourself, why did they assume the worst?”

But it’s not just “about them”. It’s also about the medium – an awkward, body-language-challenged medium.

Also, people can feel “safe” behind the email wall (meaning they know they won’t get punched in the face – at least not immediately).  There’s something about the medium that can cause people to flame – EVEN if they are not normally flame-throwers. Jaron Lanier in You Are Not a Gadget gives a good explanation for how and why this phenomenon occurs. Read the book, even if you don’t always take Jaron seriously. He is brave and bold, and he challenges many assumptions about internet culture.

Everyone has stories about email messages they wish they had never written, or email messages they wish they had never read.

It’s wise to understand how media mediates our interactions with each other. That is an important kind of literacy: a literacy of understanding media effects.