Your Voice is Puppeteering an Avatar in my Brain

November 23, 2014

I have been having a lot of video chat conversations recently with a colleague who is on the opposite side of the continent.

Now that we have been “seeing” each other on a weekly basis, we have become very familiar with each other’s voices, facial expressions, gesticulations, and so on.

But, as is common with any video conferencing system: the audio and video signal is unpredictable. Often the video signal totally freezes up, or it lags behind the voice. It can be really distracting when the facial expressions and mouth movements do not match the sound I’m hearing.

Sometimes we prefer to just turn off the video and stick with voice.

One day after turning off the video, I came to the realization that I have become so familiar with his body language that I can pretty much guess what I would be seeing as he spoke. Basically, I realized that…

HIS VOICE WAS PUPPETEERING HIS VIRTUAL SELF IN MY BRAIN.

Since the voice of my colleague is normally synchronized with his physical gesticulations, facial expressions, and body motions, I can easily imagine the visual counterpart to his voice.

This is not new to video chat. It has been happening for a long time with telephone, when we speak with someone we know intimately.

puppeteer

In fact, it may have even happened at the dawn of our species.

According to gestural theory, physical, visible gesture was once the primary communication modality in our ape ancestors. Then, our ancestors began using their hands increasingly for tool manipulation—and this created evolutionary pressure for vocal sounds to take over as the primary language delivery method. The result is that we humans can walk, use tools, and talk, all at the same time.

As gestures gave way to audible language, our ancestors could keep looking for nuts and berries while their companions were yacking on.

Here’s the point: The entire progression from gesture to voice remains as a vestigial pathway in our brains. And this is why I so easily imagine my friend gesturing at me as I listen to his voice.

Homunculi and Mirror Neurons

There are many complex structures in my brain, including several body maps that represent the positions, movements and sensations within my physical body. There are also mirror neurons – which help me to relate to and sympathize with other people. There are neural structures that cause me to recognize faces, walking gaits, and voices of people I know.

Evolutionary biology and neuroscience research points to the possibility that language may have evolved out of, and in tandem with gestural communication in homo sapiens. Even as audible language was freed from the physicality of gesture, the sound of one’s voice remains naturally associated with the visual, physical energy of the source of that voice (for more on this line of reasoning, check out Terrance Deacon).

puppeteer2Puppeteering is the art of making something come to life, whether with strings (as in a marionette), or with your hand (as in a muppet). The greatest puppeteers know how to make the most expressive movement with the fewest strings.

The same principle applies when I am having a Skype call with my wife. I am so intimately familiar with her voice and the associated visual counterpart, that all it takes is a few puppet strings for her to appear and begin animating in my mind – often triggered by a tiny, scratchy voice in a cell phone.

Enough pattern-recognition material has accumulated in my brain to do most of the work.

I am fascinated with the processes that go on in our brains that allow us to build such useful and reliable inner-representations of each other. And I have wondered if we could use more biomimicry – to apply more of these natural processes towards the goal of transmitting body language and voice across the internet.

These ideas are explored in depth in Voice as Puppeteer.


Voice as Puppeteer

May 5, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————–

According to Gestural Theory, verbal language emerged from the primal energy of the body, from physical and vocal gestures.

url

The human mind is at home in a world of abstract symbols – a virtual world separated from the gestural origins of those symbols. An evolution from the analog to the digital continues today with the flood of the internet over earth’s geocortex. Our thoughts are awash in the alphabet: a digital artifact that arose from a gestural past. It’s hard to imagine that the mind could have created the concepts of Self, God, Logic, and Math: belief structures so deep in our wiring – generated over millions of years of genetic, cultural, and neural evolution. I’m not even sure if I fully believe that these structures are non-eternal and human-fabricated. Since the Copernican Revolution yanked humans out from the center of the universe, it continues to progressively kick down the pedestals of hubris. But, being humans, we cannot stop this trajectory of virtuality, even as we become more aware of it as such.

I’ve observed something about the birth of online virtual worlds, and the foundational technologies involved. One of the earliest online virtual worlds was Onlive Traveler, which used realtime voice.

onlive1

My colleague, Steve DiPaola invented some techniques for Traveler which cause the voice to animate the floating faces that served as avatars.

But as online virtual worlds started to proliferate, they incorporated the technology of chat rooms – textual conversations. One quirky side-effect of this was the collision of computergraphical humanoid 3D models with text-chat. These are strange bedfellows indeed – occupying vastly different cognitive dimensions.

chat_avatars

Many of us worked our craft to make these bedfellows not so strange, such as the techniques that I invented with Chuck Clanton at There.com, called Avatar Centric Communication.

Later, voice was introduced to There.com. I invented a technique for There.com voice chat, and later re-implemented a variation for Second Life, for voice-triggered gesticulation.

Imagine the uncanny valley of hearing real voices coming from avatars with no associated animation. When I first witnessed this in a demo, the avatars came across as propped-up corpses with telephone speakers attached to their heads. Being so tuned-in to body language as I am, I got up on the gesticulation soap box and started a campaign to add voice-triggered animation. As an added visual aid, I created the sound wave animation that appears above avatar heads for both There and SL…

waves

Gesticulation is the physical-visual counterpart to vocal energy – we gesticulate when we speak – moving our eyebrows, head, hands, etc. – and it’s almost entirely unconscious. Since humans are so verbally-oriented, and since we expect our bodies to produce natural body language to correspond to our spoken communications, we should expect the same of our avatars. This is the rationale for avatar gesticulation.

I think that a new form of puppeteering is on the horizon. It will use the voice. And it won’t just take sound signal amplitudes as input, as I did with voice-triggered gesticulation. It will parse the actual words and generate gestural emblems as well as gesticulations. And just as we will be able to layer filters onto our voices to mask our identities or role-play as certain characters, we will also be able to filter our body language to mimic the physical idiolects of Egyptians, Native Americans, Sicilians, four-year-old Chinese girls, and 90-year old Ethiopian men.

Digital-alphabetic-technological humanity reaches down to the gestural underbelly and invokes the primal energy of communication. It’s a reversal of the gesture-to-words vector of Gestural Theory.

And it’s the only choice we have for transmitting natural language over the geocortex, because we are sitting on top of a thousands-year-old heap of alphabetic evolution.


Seven Hundred Puppet Strings

March 31, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————–
The human body has about seven hundred muscles. Some of them are in the digestive tract, and make their living by pushing food along from sphincter to sphincter. Yum! These muscles are part of the autonomic nervous system.

Other muscles are in charge of holding the head upright while walking. Others are in charge of furrowing the brow when a situation calls for worry. The majority of these muscles are controlled without conscious effort. Even when we do make a conscious movement (like waving a hand at Bonnie), the many arm muscles involved just do the right thing without our having to think about what each muscle is doing. The command region of the brain says, “wave at Bonnie”, and everything just happens like magic. Unless Bonnie scowls and looks the other way, in which case, the brow furrows, and is sometimes accompanied by grumbling vocalizations.

The avatar equivalent of unconscious muscle control is a pile of procedural software and animation scripts that are designed to “do the right thing” when the human avatar controller makes a high-level command, like <walk>, or <do_the_coy_shoulder_move>, or <wave_at, “Bonnie”>. Sometimes, an avatar controller might want to get a little more nuanced: <walk_like, “Alfred Hitchcock”>; <wave_wildly_at, “Bonnie”>. I have pontificated about the art of puppeteering avatars in the following two web sites:

www.Avatology.com
www.AvatarPuppeteering.com

Also this interview with me by Andrea Romeo discusses some of the ideas about avatar puppetry that he and I have been bantering around for about a year now.

The question of how much control to apply on your virtual self has been rolling around in my head ever since I started writing avatar code for There.com and Second Life. Avatar control code is like a complex marionette system, where every “muscle” of the avatar has a string attached to it. But instead of all strings having equal importance, these strings are arranged in a hierarchical structure.

The avatar controller may not necessarily want or need to have access to every muscle’s puppet string. The question is: which puppet strings do the avatar controller want to control at any given time, and…how?

I’ve been thinking about how to make a system that allows a user to shift up and down the hierarchy, in the same way that our brains shift focus among different motion regimes

MOTION-CAPTURE ALONE WILL NOT PROVIDE THE NECESSARY INPUTS FOR VIRTUAL BODY LANGUAGE.

The movements – communicative and otherwise – that our future avatars make in virtual spaces may be partially generated through live motion-capture, but in most cases, there will be substitutions, modifications, and deconstructions of direct motion capture. Brian Rotman sez:

“Motion capture technology, then, allows the communicational, instrumental, and affective traffic of the body in all its movements, openings, tensings, foldings, and rhythms into the orbit of “writing”.

Becoming Beside Ourselves, page 47

Thus, body language will be alphabetized and textified for efficient traversal across the geocortex. This will give us the semantic knobs needed to puppeteer our virtual selves – at a distance. And to engage the semiotic process.

If I need my avatar to run up a hill to watch out for a hovercraft, or to walk into the next room to attend another business meeting, I don’t want to have to literally ambulate here in my tiny apartment to generate this movement in my avatar. I would be slamming myself against the walls and waking up the neighbors. The answer to generating the full repertoire of avatar behavior is hierarchical puppeteering. And on many levels. I may want my facial expressions, head movements, and hand movements to be captured while explaining something to my colleagues in remote places, but when I have to take a bio-break, or cough, or sneeze, I’ll not want that to be broadcast over the geocortex

And I expect the avatar code to do my virtual breathing for me.

And when my avatar eats ravioli, I will want its virtual digestive tract to just do its thing, and make a little avatar poop when it’s done digesting. These autonomic inner workings are best left to code. Everything else should have a string, and these strings should be clustered in many combinations for me to tug at many different semantic levels. I call this Hierarchical Puppetry.

Here’s a journal article I wrote called Hierarchical Puppetry.


Using Kinect to Puppeteer my Avatar? My Arms are Getting Tired Just Thinking About It

March 24, 2011

Wagner James Au pointed me to his New World Notes blog post about Microsoft’s Kinect – hooked up to Second Life. It highlights a video made by Thai Phan of ICT showing a way to control SL avatars using Kinect.

The Kinect offers huge potential for revolutionizing user interaction design. Watch this video and you will agree. But I would warn readers against the knee-jerk conclusion that the ultimate goal of gestural interfaces is to allow us to just be ourselves. Let me explain.

Multi-Level Puppeteering

Either because of the clunky interfaces to the Second Life avatar, or because Thai is smart about virtual body language messaging (or both), he has rigged up the system to recognize gestures – emblems if you will. These are interpreted into semantic units that trigger common avatar animations. One nice procedural piece he added was the ability to start and stop animations, holding them in place, and then stopping them, based on the user’s movements.

Wagner James Au points out the powerful effect and utility that would come about if we had a more direct-manipulation approach, whereby all the body’s motions are mapped onto the avatar. He makes an open call for “different variations of Kinect-to-SL interaction, experimenting with the most natural body-to-avatar UI”. He cites the avatar puppeteering work I did while I was at Linden Lab. He also cites the continuing thread among residents hoping to have puppeteering revived. We are all of similar minds as to the power of direct-manipulation avatar animation in SL.

But, as my book points out, direct gestural interfaces are not for everyone, and … not all the time! Also, some people have physical disabilities, and so they cannot “be themselves” gesturally. They have no choice but to use virtual body language to control their avatar expressions.

And for people like Stephen Hawking, the Kinect is useless. Personally, I would LOVE it if someone could invent a virtual body language interface as an extension to Hawking’s speech synthesizer, to drive a Hawking avatar. Wouldn’t it be cool to witness the excitement in Hawking’s whole being while describing the Big Bang?

Throttling the Gestural Pipeline

But, getting back to the subject of those of us who are able to move our bodies…

The question I have is…WHEN is whole-body gestural input a good thing, and WHEN is it unnecessary and cumbersome? Or moot?

Here’s a prediction: eventually we will have Kinect-like devices installed everywhere – in our homes, our business offices, even our cars. Public environments will be installed with the equivalent of the Vicon motion capture studio. Natural body language will be continually sucked into multiple ubiquitous computer input devices. They will watch our every move.

With the likelihood of large screens or augmented reality displays showing our avatars among remote users, we will be able to have our motions mapped onto our avatars.

Or not.

And that’s the point: we will want to be able to control when, and to what degree, our movements get broadcast to the cloud. Ultimately, WE need to have the ability to turn on or off the distribution of our direct-manipulation body language. The Design Challenge is how to provide intuitive controls.

The Homuncular Kinection

I have a crew of puppeteers in my brain: bodymaps, homunculi, controllers in my prefrontal cortex, mirror neurons, and other neural structures. So that my arms don’t get tired from having to do all my avatar’s gesturing, my neural puppeteers are activated when I want to evoke mediated body language. I do it in many ways, and across many media (including text: eg, emoticons).

Stephen Hawking also has humuncular puppeteers in his brain.

Ultimately, I want several layers of control permitting me to provide small-motion substitutions for large motions, or complex bodily expressions. Virtual Body Language will permit an infinite variety of ways for me to control my avatar…including wiggling my index finger to make my avatar nod “yes”.

THIS is where the magic will happen for the future of Kinect interfaces for avatar puppeteering.