How Does Artificial Life Avoid the Uncanny Valley?

July 6, 2015

The following creepy humanoids provide ample reason to fear artificial intelligence:

Screen Shot 2015-07-05 at 12.59.52 PM

This is just one example of virtual humans that would be appropriate in a horror movie. There are many others. Here’s my question: why are there so many creepy humans in computer animation?

Screen Shot 2015-07-05 at 7.27.08 PMThe uncanny problem is not necessarily due to the AI itself: it’s usually the result of failed attempts at generating appropriate body language for the AI. As I point out in the Gestural Turing Test: “intelligence has a body”. And nothing ruins a good AI more than terrible body language. And yes, when I say “body language”, I include the sound, rhythm, timbre, and prosody of the voice (which is produced in the body).

Simulated body language can steer clear of the uncanny valley with some simple rules of thumb:

1. Don’t simulate humans unless you absolutely have to.

2. Use eye contact between characters. This is not rocket science, folks.

3. Cartoonify. Less visual detail leaves more to the imagination and less that can go wrong.

4. Do the work to make your AI express itself using emotional cues. Don’t be lazy about it.

Shameless plug: Wiglets are super-cartoony non-humanoid critters that avoid the uncanny valley, and use emotional cues, like eye contact, proxemic movements, etc.

ww

These videos show how wiglets move and act.

0313lifeArtificial Life was invented partly as a way to get around a core problem of AI: humans are the most sophisticated and complex animals on Earth. Simulating them in a realistic way is nearly impossible, because we can always detect a fake. Getting it wrong (which is almost always the case) results in something creepy, scary, clumsy, or just plain useless.

In contrast, simulating non-human animals (starting with simple organisms and working up the chain of emergent complexity) is a pragmatic program for scientific research – not to mention developing consumer products, toys, games, and virtual companions.

We’ll get to believable artificial humans some day.

Meanwhile…

I am having a grand old time making virtual animals using simulated physics, genetics, and a touch of AI. No lofty goals here. With a good dose of imagination (people have plenty of it), it only takes a teaspoon of AI (crafted just right) to make a compelling experience – to make something feel and act sentient. And with the right blend of body language, responsiveness, and interactivity, imagination can fill-in all the missing details.

Alan Turing understood the role of the observer, and this is why he chose a behaviorist approach to asking the question: “what is intelligence?”

intelligent-animals-01Artificial Intelligence is founded on the anthropomorphic notion that human minds are the pinnacle of intelligence on Earth. But hubris can sometimes get in the way of progress. Artificial Life – on the other hand, recognizes that intelligence originates from deep within ancient Earth. We are well-advised to understand it (and simulate it) as a way to better understand ourselves, and how we came to be who we are.

It’s also not a bad way to avoid the uncanny valley.


On Phone Menus and the Blowing of Gaskets

January 2, 2013

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

This blog post is only tangentially related to avatars and body language. But it does relate to the larger subject of communication technology that fails to accommodate normal human behavior and the rules of natural language.

But first, an appetizer. Check out this video for a phone menu for callers to the Tennessee State Mental Hospital:

http://www.youtube.com/watch?v=zjABiLYrKKE


A Typical Scenario

You’ve probably had this experience. You call a company or service to ask about your bill, or to make a general inquiry. You are dumped into a sea of countless menu options given by a recorded message (I say countless, because you usually don’t know how many options you have to listen to – will it stop at 5? Or will I have to listen to 10?). None of the options apply to you. Or maybe some do. You’re not really sure. You hope – you pray, that you will be given the option to speak to a representative, a living, breathing, thinking, soft and cuddly human. After several agonizing minutes (by now you’ve forgotten most of the long-winded options) you realize that there is no option to speak to a human. Or at least youthink there is no option. You’re not really sure.

Your blood pressure has now reached levels that warrant medical attention. If you still have rational neurons firing, you get the notion to press “0″. And the voice says, “Please wait to speak to a phone representative”. You collapse in relief. The voice continues: “this call may be recorded for quality assurance” Yea, right. (I think I remember once actually hearing the message say, “this call may be recorded……because…we care”. Okay now that is gasket-blowing material).

Why Conversation Matters

I don’t think I need to go into this any further. Just do a search on “phone menu” (or “phone tree”) and “frustration”, or something like that, and follow the scent and you’ll find plenty of blog posts on the subject.

How would I best characterize this problem? I could talk about it from an economic point of view. For instance it costs a company a lot more to hire real people than to hook up an automated answering service or an interactive voice response (IVR) system. But companies have to also weigh the negative impact of a large percentage of irate customers. But too few companies look at this as a Design problem. Ah, there it is again: that ever-present normalizer and humanizer of technology: DesignIt’s invisible when it works well, and that’s why it is such an unsung hero.

The Hyper-Linearity of Non-Interactive Verbal Messages

The nature of this design problem, I believe, is that these phone menus give a large amount of verbal information (words, sentences, options, numbers, etc.) which take time to explain. They are laid out in a sequential order.

There is no way to jump ahead, to interrupt the monolog, or to ask it for clarification, as you would in a normal conversation. You are stuck in time – rigid, linear time, with no escape. (At least that’s what it feels like: there are usually options to hit special keys to go to the previous menu or pop out entirely, etc. But who knows what those keys are? And the dreaded fear of getting disconnected is enough to keep people like me staying within the lines, gritting  teeth, and being obedient (although that means I have the potential to become the McDonald’s gunman who makes the headlines the next morning.)

Compare this with a conversation with a phone representative: normal human dialog involves interruptions, clarifications, repetitions, mirroring (the “mm’s”, “hmm’s”, “ah’s”, “ok’s”, “uh-huh’s”, and such – the audible equivalent of eye-contact and head-nods), and all the affordances that you get from the prosody of speech. Natural conversations continually adapt to the situation. These adaptive, conversational dynamics are absent from the braindead phone robots. And their soft, soothing voices don’t help – in fact they only make me want to kill them that much harder.

There are two solutions:

1. Full-blown Artificial Intelligence, allowing the robot voice to “hear” your concerns, questions, and drill down, with your help, to the crux of the problem. But I’m afraid that AI  has a way to go before this is possible. And even if it is almost possible, the good psychologists, interaction designers, and human-user interface experts don’t seem to be running the show. They are outnumbered by the techno-geeks with low EQ, and little understanding of human psychology. Left-brainers gather the power and influence, and run the machines – computer-wise and business-wise – because they are good with the numbers, and rarely blow a gasket. The right-brained skill set ends up stuck on the periphery, almost by its very nature. I’m waiting for this revolution I keep hearing about – the Revenge of the Right Brain. So far, I still keep hitting brick walls built with left-brained mortar. But I digress.

2. Visual interfaces. By having all the options laid out in a visual space, the user’s eyes can jump around (much more quickly than a robot can utter the options). Thus, if the layout is designed well (a rarity in the internet junkyard) the user can quickly see, “ah, I have five options. Maybe I want to choose option 4 – I will select, “more information about option 4 to make sure”. All of this can happen within a matter of seconds. You could almost say that the interface affords a kind of body language that the user reads and acts upon immediately.

Consider the illustration below for a company’s phone tree which I found on the internet (I blacked-out the company name and phone number). Wouldn’t it be nice if you could just take a glance at this visual diagram and jump to the choice you want? If you’re like me, your eyes will jump straight to the bottom where the choice to speak to a representative is. (Of course it’s at the bottom).

This picture says it all. But of course. We each have two eyes, each with millions of photoreceptors: simultaneity, parallelism, instant grok. But since I’m talking about telephones, the solution has to be found within the modality of audio alone, trapped in time. And in that case, there is no other solution than an advanced AI program that can understand your question, read your prosodic body language, and respond to the flow of the conversation, thus collapsing time.

…and since that’s not coming for a while, there’s another choice: a meat puppet – one of those very expensive communication units that burn calories, and require a salary. What a nuisance.


Uncanny Charlie

August 18, 2012

The subway system in Boston has a mascot named “Charlie”, a cartoon character who rides the train and reminds people to use the “Charlie Card”. With the exception of his face, he looks like a normal airbrushed graphic of a guy with a hat. But his face? Uh, it’s f’d up.

In case you don’t know yet about the Uncanny Valley, it refers to a graph devised by a Japanese robot maker. The graph shows typical reactions to human likeness in robots and other simulations. The more realistic the robot (or computer generated character) the more CREEPY it becomes….

..until it is so utterly realistic that you are fooled, and you respond to it as if it were a living human. But watch out. If the eyes do something wacky or scary, or if something else reveals the fact that it is just an animated corpse…DOWN you fall…. into the valley.

Anyway, I have a theory about the uncanny valley: it is just a specific example of a more general phenomenon that occurs when incompatible levels of realism are juxtaposed in a single viewing experience. So for instance, an animated film in which the character motions are realistic – but their faces are abstract – can be creepy. How about a computer animation in which the rendering is super-realistic, but the motions are stiff and artificial? Creepola. A cartoon character where one aspect is stylized and other aspects are realistic looks…not right. That’s Charlie’s issue.

Stylized faces are everywhere:

But when an artist takes a stylized line-drawn graphic of a face and renders it with shading, I consider this to be a visual language blunder. The exception to this rule of thumb is demonstrated by artists who purposefully juxtapose styles and levels of realism, for artistic impact, such as the post-modern painter David Salle.

The subject of levels of realism and accessibility in graphic design is covered in McCloud’s Understanding Comics. The image-reading eyebrain can adjust its zone of suspension of disbelief to accommodate a particular level of stylism/realism. But in general, it cannot easily handle having that zone bifurcated.

Charlie either needs a face transplant to match his jacket and hat, or else he needs to start wearing f’d-up clothes to match his f’d-up face.


Nano Avatars

June 8, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————–

The other day, Jeremy Owen Turner told me about NanoArt. Here’s a cool nano art piece by Yong Qing Fu, described in Chemistry World.

nano

We started imagining a nano virtual world. Jeremy pontificates on avatars as works of art, avatars that can take on alternate forms, including nano art. I started thinking about what an avatar that consisted of a molecule might be like.

Some illustrations of the hemoglobin molecule look a bit like the flying spaghetti monster. Which reminds me, Cory Linden’s avatar in Second Life is based on the flying spaghetti monster.

Spaghetti

We’ve seen avatars hanging out among virtual molecules

avatar_in_molecule

but what about avatars that ARE molecules? Stephanie H. Chanteau and James M. Tour of Rice University created anthropomorphic molecules.

NanoKid2

But I’m not so interested in how people make anthropomorphic molecules. I’m interested in avatars that live a molecule’s life. Check this out…

scanning tunneling microscope (STM) is set up in a magnificent auditorium.

STM

The microscope’s subject matter is projected onto a giant video screen. An audience of thousands watch as a team of five molecule-avatar controllers sit with computer mice and keyboards and mingle in a virtual world that is actually not virtual. In the middle of all the flamboyant machinery is a tiny nano-stage, a performance dance floor where five molecules show something rather strange and new

Since the STM can be used for atom manipulation as well as visioning (a consequence of the Observer Effect), the very technology for seeing the avatars is used to control them.

The audience collectively winces as the avatars try to, um, walk. Okay, maybe walking isn’t the right word. What exactly do these avatars do? They combine to form supermolecules. They jump and twitch. They split and reform. They blink and chirp. They fall off the edge of the stage and accidentally get stuck on carbon atoms. It may not be elegant. But hey it would be so cool to watch.

When the performance is done, the avatars take a bow…or something. The audience applauds with a standing ovation. A new genre is born. Constraints define creative boundaries and therefore creativity. And the limited repertoire of molecular interactions define the social vocabulary of these agents. Kind of reminds me of Flatland.

Avatars are embodiments of humans (or human intention) in virtual worlds.

“Seeing” a molecule is a problematic term, in the same sense that “seeing” a planet in a distant star system is a problematic term. It’s not “seeing” on a human scale. It’sprosthetic seeing. And so, just like a software-based virtual world, there must be arenderer.

molecule

Our most distant ancestor is a molecule that accidentally replicated and thus started the upward avalanche that is called Evolution. Dennett’s intentional stance can be applied on all levels of the biosphere. Molecular avatars represent the most basic and primitive expression of agentry. And unlike the constraints of C++, Havok, and OpenGL, in virtual world software programs, the constraints in this molecular world are real.

It may yield some insights about the fundamentals of interaction.


Just Because It’s Visual Doesn’t Mean It’s Better

May 24, 2012

I’ve been renting a lot of cars lately because my own car died. And so I get to see a lot of the interiors of American cars. Car design is generally more user-friendly than computer interfaces – for the simple reason that when you make a mistake on a computer interface and the computer crashes, you will not die.

As cars become increasingly computerized, the “body language” starts to get wonky, even in aspects that are purely mechanical.

In a car I recently rented, I was looking for the emergency brake. The body language of most of the cars I’ve used offers an emergency brake just to the right of my seat in the form of a lever that I pull up. Body language between human bodies is mostly unconscious. If a human-manufactured tool is designed well, its body langage is also mostly-unconscious: it is natural. Anyway…I could not find an emergency brake in the usual place in this particular car. So I looked in the next logical place: near the floor to the left of the foot pedals. There I saw the following THING:

I wanted to check to make sure it was the brake, so that I wouldn’t inadvertently pop open the hood or the cap of the gas tank. So I peered more closely at the symbol on this particular THING, and I asked myself the following question:

What the F?

Once I realized that this was indeed the emergency brake, I decided that a simple word would have sufficed.

In some cars, the “required action” is written on the brake:


Illiterate Icon Artists

I was reminded of an episode in one of the companies I was working for, where an “icon artist” was hired to build the visual symbols for several buttons on a computer interface. He had devised a series of icons that were meant to provide visual language counterparts to basic actions that we typically do on computer interfaces. He came up with novel and aesthetic symbols. But….UN-READABLE.

I suggested he just put the words on the icons, because the majority of computer users know English, and if they don’t know English, they could always open up a dictionary. Basically, this guy’s clever icons had no counterpart to the rest of the world. They were his own invention – they were UNDISCOVERABLE.

Moral of the story:

Designed body language should corresponds to “natural affordances”;  the expectations and readability of the natural world. If that is not possible, use historical conventions (by now there is plenty of reference material on visual symbols, and I would suspect that by now there are ways to check for the relative “universality” of certain symbols).

In both cases, whether using words or visuals, literacy is needed.

Put in another way:

It is impossible to invent a visual langage from scratch. Because the only one who can visually “read” it is the creator. If it does not commute, it is not language. This applies to visual icons as much as it does to words.

As technology becomes more and more computerized (like cars) we have less and less opportunity to take advantage of natural affordances. Eventually, it will be possible to set the emergency brake by touching a tiny red button, or by uttering a message into a microphone. Thankfully, emergency brakes are still very physical, and I get to FEEL the pressure of that brake as I push it in, or pop it off….

that is…if I can ever find the damn thing.


Screensharing: Don’t Look at Me

January 11, 2012

Imagine discussing a project you are doing with a small group: a web site, a drawing, a contraption you are building; whatever. You would not expect the people to be looking at your face the whole time. Much of the time you will all be gazing around at different parts of the project. You may be pointing your fingers around, using terms like “this”, “that”, “here” and “there”.

When people have their focus on something separate from their own bodies, that thing becomes an extension of their bodies. Bodymind is not bound by skin. And collaborating, communicating bodyminds meld on an object of common interest.

TeleKinesics

The internet is dispersing our workspaces globally, and the same is happening to our bodies.

The anthropologist, Ray Birdwhistell coined the term “kinesics“, referring to the interpretation, science, or study of body language.

I invented a word: “telekinesics”. I define it as, “the science of body language as conducted over remote distances via some medium, including the internet” (ref)

My primary interest is the creation of body langage using remote manifestations of ourselves, such as with avatars and other visual-interactive forms. I don’t consider video conferencing as a form of virtual body language, because it is essentially a re-creation of one’s literal appearances and sounds. It is an extension of telephony.

But it is virtual in one sense: it is remote from your real body.

Video conferencing, and applications like Skype are extremely useful. I use Skype all the time to chat with friends or colleagues. Seeing my collaborator’s face helps tremendously to fill-in the missing nonverbal signals in telephony. But if the subject of conversation is a project we are working on, then “face-time”, is not helpful. We need to enter into, and embody, the space of our collaboration.

Screen Sharing

This is why screen sharing is so useful. Screen sharing happens when you flip a switch on your Skype (or whatever) application that changes the output signal from your camera to your computer screen. Your mouse cursor becomes a tiny Vanna White – annotating, referencing, directing people’s gazes.

Michael Braun, in the blog post: Screen Sharing for Face Time, says that seeing your chat partner is not always helpful, while screen sharing “has been shown to increase productivity. When remote participants had access to a shared workspace (for example, seeing the same spreadsheet or computer program), then their productivity improved. This is not especially surprising to anyone who has tried to give someone computer help over the phone. Not being able to see that person’s screen can be maddening, because the person needing help has to describe everything and the person giving help has to reconstruct the problem in her mind.”

Many software applications include cute features like collaborative drawing spaces, intended for co-collaborators to co-create, co-communicate, and to to co-mess up each other’s co-work. The interaction design (from what I’ve seen) is generally awkward. But more to the point: we don’t yet have a good sense of how people can and should interact in such collaborative virtual spaces. The technology is still frothing like tadpole eggs.

Some proponents of gestural theory believe that one reason speech emerged out of gestural communication was because it freed up the “talking hands” so that they could do physical work – so our mouths started to do the talking. Result: we can put our hands to work, look at our work, and talk about it, all at the same time.

Screen sharing may be a natural evolutionary trend – a continuing thread to this ancient  activity – as manifested in the virtual world of internet communications.

 

 


Virtual Sentience Requires a Gaze

November 28, 2011

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet.  I originally wrote it in September of 2009. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————

I was speaking with my colleague Michael Nixon at the School of Interactive Art and Technology. We were talking about body language in non-human animated characters. He commented that before you can imbue a virtual character with apparent sentience, it has to have the ability to GAZE – in other words, look at something. In other words, it has a head with eyes. Or maybe just a head. Or… a “head”.

Here’s the thing about gaze: it pokes out of the local (“lonely”) coordinate system of the character and into the global (“social”) coordinate system of the world and other sentient beings. Gaze is the psychic vector that connects a character with the world. The character “places it’s gaze upon the world”. Luxo Jr is a great example of imbuing an otherwise inanimate object with sentience (and lots of personality besides) by using body language such as gaze.

I have observed something missing in video conferencing. Gaze. Notice in this set of four images how the video chat participants cannot make eye-contact with each other. This is because they are not sharing the same physical 3D space. Nor are they sharing the same virtual 3D space!

Gaze is one of the most powerful communicative elements of natural language, along with the musicality of speech, and of course facial and bodily gesture. This is especially true among groups of young single people in which hormones are flying, and flirtation, coyness, and jealousy create a symphony of psychic vectors…


At There.com, I designed the initial avatar gaze system. With the help of Chuck Clanton, I created an “intimacam”, which aimed perpendicular to the consensual gaze of the avatars, and zoomed-in closer when the avatar heads came closer to each other.

The greatest animators have known about the power of gaze for as long as the craft has existed. This highly-social component of body language has a mathematical manifestation in the virtual spaces of cartoons, computer games, and virtual worlds. And it is one of the many elements that will become refined and codified and included into the virtual body language of the internet.

Human communication is migrating over to the internet – the geo-cortex of posthumanity. Text is leading the way. Body language has some catching up to do. Brian Rotman has some interesting things to say along these lines in his book, Becoming Beside Ourselves.

We can learn a lot from Pixar animators, as well as psychologists and actors, as we develop virtual worlds and collaborative workspaces.

————–

In response to my earlier post, Laban-for-animators expert Leslie Bishko made this comment:

“My .2c – breath promotes the illusion of sentience, gaze promotes the illusion of interaction and relationship!”