Voice as Puppeteer

May 5, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).


According to Gestural Theory, verbal language emerged from the primal energy of the body, from physical and vocal gestures.


The human mind is at home in a world of abstract symbols – a virtual world separated from the gestural origins of those symbols. An evolution from the analog to the digital continues today with the flood of the internet over earth’s geocortex. Our thoughts are awash in the alphabet: a digital artifact that arose from a gestural past. It’s hard to imagine that the mind could have created the concepts of Self, God, Logic, and Math: belief structures so deep in our wiring – generated over millions of years of genetic, cultural, and neural evolution. I’m not even sure if I fully believe that these structures are non-eternal and human-fabricated. Since the Copernican Revolution yanked humans out from the center of the universe, it continues to progressively kick down the pedestals of hubris. But, being humans, we cannot stop this trajectory of virtuality, even as we become more aware of it as such.

I’ve observed something about the birth of online virtual worlds, and the foundational technologies involved. One of the earliest online virtual worlds was Onlive Traveler, which used realtime voice.


My colleague, Steve DiPaola invented some techniques for Traveler which cause the voice to animate the floating faces that served as avatars.

But as online virtual worlds started to proliferate, they incorporated the technology of chat rooms – textual conversations. One quirky side-effect of this was the collision of computergraphical humanoid 3D models with text-chat. These are strange bedfellows indeed – occupying vastly different cognitive dimensions.


Many of us worked our craft to make these bedfellows not so strange, such as the techniques that I invented with Chuck Clanton at There.com, called Avatar Centric Communication.

Later, voice was introduced to There.com. I invented a technique for There.com voice chat, and later re-implemented a variation for Second Life, for voice-triggered gesticulation.

Imagine the uncanny valley of hearing real voices coming from avatars with no associated animation. When I first witnessed this in a demo, the avatars came across as propped-up corpses with telephone speakers attached to their heads. Being so tuned-in to body language as I am, I got up on the gesticulation soap box and started a campaign to add voice-triggered animation. As an added visual aid, I created the sound wave animation that appears above avatar heads for both There and SL…


Gesticulation is the physical-visual counterpart to vocal energy – we gesticulate when we speak – moving our eyebrows, head, hands, etc. – and it’s almost entirely unconscious. Since humans are so verbally-oriented, and since we expect our bodies to produce natural body language to correspond to our spoken communications, we should expect the same of our avatars. This is the rationale for avatar gesticulation.

I think that a new form of puppeteering is on the horizon. It will use the voice. And it won’t just take sound signal amplitudes as input, as I did with voice-triggered gesticulation. It will parse the actual words and generate gestural emblems as well as gesticulations. And just as we will be able to layer filters onto our voices to mask our identities or role-play as certain characters, we will also be able to filter our body language to mimic the physical idiolects of Egyptians, Native Americans, Sicilians, four-year-old Chinese girls, and 90-year old Ethiopian men.

Digital-alphabetic-technological humanity reaches down to the gestural underbelly and invokes the primal energy of communication. It’s a reversal of the gesture-to-words vector of Gestural Theory.

And it’s the only choice we have for transmitting natural language over the geocortex, because we are sitting on top of a thousands-year-old heap of alphabetic evolution.

A Future Man Experiences Sex as a Female

April 20, 2012

I am a heterosexual male, happily married, and by most accounts, normal and healthy. This blog post is a what-if, extrapolating upon the idea of having a virtual body…..


Frank Zappa said that the dirtiest part of your body is your mind. It is hard to disagree with this. Your mind is capable of generating some serious filth (unless you never bathe, in which case, it is possible that parts of your body may actually be dirtier than your mind).

Obviously, the body has something to do with sex. But there is indeed a psychological, cognitive, emotional, imaginative dimension. It seems that these mental aspects of sex become more important as we get older. One obvious reason: aging. Entropy! Deteriorating, wrinkling, flabbifying, and weakening our bodies. But our aging minds are often as sharp as ever, and capable of higher dimensions of love and romance (and filth). It’s a shame that youth must be wasted on the young. I am referring to us in our earlier years when we had great bodies and great physical strength…but OH how immature we were.

Ray Kurzweil and other futurists suggest that virtual reality will be fully-integrated into our lives in the future. One could also assume that virtual sex will continue from its current occasional manifestations of phone sex, sexting, and avatar play in virtual worlds. There are already non-technological forms of virtual reality such as imaginative play, role-playing, etc. It’s only recently that technology has evolved enough to enhance the experience (or ruin it…depending on your vantage point).

Fantastic Sex at Age 100

The difference between mortality and immortality will become fuzzier in the future. Humans may achieve a certain kind of immortality by having their brains uploaded into a virtual reality when they are physically dead (or transformed into a cyborg, whichever comes first). This of course is based on the assumption that one can still experience a continuous life, having nothing left but a brain, and that this brain can be uploaded to some renewable medium…highly-debatable at this early juncture. But let’s roll with it anyway. I can imagine that a 100-year old future human might engage in sex with all the vigor and muscle tone associated with youth (think Jake Sully in Avatar who got his legs back as a Na’vi). Think of this youthful sex…but with the imagination, wisdom, and capacity for love that only a 100-year-old could possess.

I’m a software guy, not a hardware guy, so I can’t say much about nanobots and teledildonics and other technological enhancements of human physicality. But I can imagine that given the appropriate virtual reality enhancements, I could experience something akin to being a female. If nanobots are indeed a part of our future, they might be able to stimulate the brain chemistry and bodily sensation associated with female thoughts and feelings.

Is this a good thing? It is a bit creepy. But I say it is a good thing. Here’s why: human imagination has no limits. Human creativity knows no bounds. The desire to understand how others experience the world is based on empathy and natural social bonding. Technology can be used for this purpose.

An earlier blog post I wrote explores the question of how we might experience non-human embodiment, and body language, through future virtual reality technology. Within the realm of human society, there are still a lot of experiences and perspectives that can be shared. It might help us understand each other a bit better. Empathy could be technologically-enhanced; generated through simulation and virtuality.

And it might make for some awesome sex.

One can only imagine. (That’ll have to do for now).

Here’s a piece by Robert Weiss about the pros and cons of virtual sex.

Seven Hundred Puppet Strings

March 31, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

The human body has about seven hundred muscles. Some of them are in the digestive tract, and make their living by pushing food along from sphincter to sphincter. Yum! These muscles are part of the autonomic nervous system.

Other muscles are in charge of holding the head upright while walking. Others are in charge of furrowing the brow when a situation calls for worry. The majority of these muscles are controlled without conscious effort. Even when we do make a conscious movement (like waving a hand at Bonnie), the many arm muscles involved just do the right thing without our having to think about what each muscle is doing. The command region of the brain says, “wave at Bonnie”, and everything just happens like magic. Unless Bonnie scowls and looks the other way, in which case, the brow furrows, and is sometimes accompanied by grumbling vocalizations.

The avatar equivalent of unconscious muscle control is a pile of procedural software and animation scripts that are designed to “do the right thing” when the human avatar controller makes a high-level command, like <walk>, or <do_the_coy_shoulder_move>, or <wave_at, “Bonnie”>. Sometimes, an avatar controller might want to get a little more nuanced: <walk_like, “Alfred Hitchcock”>; <wave_wildly_at, “Bonnie”>. I have pontificated about the art of puppeteering avatars in the following two web sites:


Also this interview with me by Andrea Romeo discusses some of the ideas about avatar puppetry that he and I have been bantering around for about a year now.

The question of how much control to apply on your virtual self has been rolling around in my head ever since I started writing avatar code for There.com and Second Life. Avatar control code is like a complex marionette system, where every “muscle” of the avatar has a string attached to it. But instead of all strings having equal importance, these strings are arranged in a hierarchical structure.

The avatar controller may not necessarily want or need to have access to every muscle’s puppet string. The question is: which puppet strings do the avatar controller want to control at any given time, and…how?

I’ve been thinking about how to make a system that allows a user to shift up and down the hierarchy, in the same way that our brains shift focus among different motion regimes


The movements – communicative and otherwise – that our future avatars make in virtual spaces may be partially generated through live motion-capture, but in most cases, there will be substitutions, modifications, and deconstructions of direct motion capture. Brian Rotman sez:

“Motion capture technology, then, allows the communicational, instrumental, and affective traffic of the body in all its movements, openings, tensings, foldings, and rhythms into the orbit of “writing”.

Becoming Beside Ourselves, page 47

Thus, body language will be alphabetized and textified for efficient traversal across the geocortex. This will give us the semantic knobs needed to puppeteer our virtual selves – at a distance. And to engage the semiotic process.

If I need my avatar to run up a hill to watch out for a hovercraft, or to walk into the next room to attend another business meeting, I don’t want to have to literally ambulate here in my tiny apartment to generate this movement in my avatar. I would be slamming myself against the walls and waking up the neighbors. The answer to generating the full repertoire of avatar behavior is hierarchical puppeteering. And on many levels. I may want my facial expressions, head movements, and hand movements to be captured while explaining something to my colleagues in remote places, but when I have to take a bio-break, or cough, or sneeze, I’ll not want that to be broadcast over the geocortex

And I expect the avatar code to do my virtual breathing for me.

And when my avatar eats ravioli, I will want its virtual digestive tract to just do its thing, and make a little avatar poop when it’s done digesting. These autonomic inner workings are best left to code. Everything else should have a string, and these strings should be clustered in many combinations for me to tug at many different semantic levels. I call this Hierarchical Puppetry.

Here’s a journal article I wrote called Hierarchical Puppetry.

Screensharing: Don’t Look at Me

January 11, 2012

Imagine discussing a project you are doing with a small group: a web site, a drawing, a contraption you are building; whatever. You would not expect the people to be looking at your face the whole time. Much of the time you will all be gazing around at different parts of the project. You may be pointing your fingers around, using terms like “this”, “that”, “here” and “there”.

When people have their focus on something separate from their own bodies, that thing becomes an extension of their bodies. Bodymind is not bound by skin. And collaborating, communicating bodyminds meld on an object of common interest.


The internet is dispersing our workspaces globally, and the same is happening to our bodies.

The anthropologist, Ray Birdwhistell coined the term “kinesics“, referring to the interpretation, science, or study of body language.

I invented a word: “telekinesics”. I define it as, “the science of body language as conducted over remote distances via some medium, including the internet” (ref)

My primary interest is the creation of body langage using remote manifestations of ourselves, such as with avatars and other visual-interactive forms. I don’t consider video conferencing as a form of virtual body language, because it is essentially a re-creation of one’s literal appearances and sounds. It is an extension of telephony.

But it is virtual in one sense: it is remote from your real body.

Video conferencing, and applications like Skype are extremely useful. I use Skype all the time to chat with friends or colleagues. Seeing my collaborator’s face helps tremendously to fill-in the missing nonverbal signals in telephony. But if the subject of conversation is a project we are working on, then “face-time”, is not helpful. We need to enter into, and embody, the space of our collaboration.

Screen Sharing

This is why screen sharing is so useful. Screen sharing happens when you flip a switch on your Skype (or whatever) application that changes the output signal from your camera to your computer screen. Your mouse cursor becomes a tiny Vanna White – annotating, referencing, directing people’s gazes.

Michael Braun, in the blog post: Screen Sharing for Face Time, says that seeing your chat partner is not always helpful, while screen sharing “has been shown to increase productivity. When remote participants had access to a shared workspace (for example, seeing the same spreadsheet or computer program), then their productivity improved. This is not especially surprising to anyone who has tried to give someone computer help over the phone. Not being able to see that person’s screen can be maddening, because the person needing help has to describe everything and the person giving help has to reconstruct the problem in her mind.”

Many software applications include cute features like collaborative drawing spaces, intended for co-collaborators to co-create, co-communicate, and to to co-mess up each other’s co-work. The interaction design (from what I’ve seen) is generally awkward. But more to the point: we don’t yet have a good sense of how people can and should interact in such collaborative virtual spaces. The technology is still frothing like tadpole eggs.

Some proponents of gestural theory believe that one reason speech emerged out of gestural communication was because it freed up the “talking hands” so that they could do physical work – so our mouths started to do the talking. Result: we can put our hands to work, look at our work, and talk about it, all at the same time.

Screen sharing may be a natural evolutionary trend – a continuing thread to this ancient  activity – as manifested in the virtual world of internet communications.



Can You Trust Email Body Language?

December 2, 2011

Steve Tobak wrote an article in CBSNews.com called, How to Read Virtual Body Language in Email.

Steve makes some interesting observations. But, like so many attempts at teaching us “how to read” body language, Steve makes several assumptions that miss the highly contextual, and highly tenuous nature of interpreting emotion via email.

In fact, email is often used by people as a way to avoid emotion or intimacy. It’s an example of asynchronous communication: an email message could take an arbitrary amount of time to compose, and it could be sent at an arbitrary time after writing it. Thus, email is not a reliable medium for reading one’s emotions. It’s hard to lie with your body. It’s much easier to lie with a virtual body. With email, you don’t even have a body.

Damn That Send Button

Actually, I wish I could say that I have always used email in a premeditative, calculated way. I have been guilty of sending email messages in the heat of an emotional moment. A few too many of those emails have lead me to believe that the SEND button should be kept in a locked box in a governmental facility. And the box should have a big sign that says, Are You Sure?

People often make the mistake of assuming that a given communication medium provides a transparent channel for human expression. Oddly enough: email can bring out certain negative qualities in people who may not be negative in normal face-to-face encounters.

People don’t take into account the McLuhan effect, and assume the message is determined only by the communicators. Steve says this about flame mail:

“You probably don’t need me to tell you this, but when you receive what we affectionately call flame mail – where someone lets loose on you in a big, ugly way – that’s aggressive behavior. In other words, they’re acting out like a child throwing a temper tantrum and it’s not about you, it’s about them. I know it’s tempting to think it’s just a misunderstanding, but ask yourself, why did they assume the worst?”

But it’s not just “about them”. It’s also about the medium – an awkward, body-language-challenged medium.

Also, people can feel “safe” behind the email wall (meaning they know they won’t get punched in the face – at least not immediately).  There’s something about the medium that can cause people to flame – EVEN if they are not normally flame-throwers. Jaron Lanier in You Are Not a Gadget gives a good explanation for how and why this phenomenon occurs. Read the book, even if you don’t always take Jaron seriously. He is brave and bold, and he challenges many assumptions about internet culture.

Everyone has stories about email messages they wish they had never written, or email messages they wish they had never read.

It’s wise to understand how media mediates our interactions with each other. That is an important kind of literacy: a literacy of understanding media effects.

Virtual Sentience Requires a Gaze

November 28, 2011

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet.  I originally wrote it in September of 2009. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).


I was speaking with my colleague Michael Nixon at the School of Interactive Art and Technology. We were talking about body language in non-human animated characters. He commented that before you can imbue a virtual character with apparent sentience, it has to have the ability to GAZE – in other words, look at something. In other words, it has a head with eyes. Or maybe just a head. Or… a “head”.

Here’s the thing about gaze: it pokes out of the local (“lonely”) coordinate system of the character and into the global (“social”) coordinate system of the world and other sentient beings. Gaze is the psychic vector that connects a character with the world. The character “places it’s gaze upon the world”. Luxo Jr is a great example of imbuing an otherwise inanimate object with sentience (and lots of personality besides) by using body language such as gaze.

I have observed something missing in video conferencing. Gaze. Notice in this set of four images how the video chat participants cannot make eye-contact with each other. This is because they are not sharing the same physical 3D space. Nor are they sharing the same virtual 3D space!

Gaze is one of the most powerful communicative elements of natural language, along with the musicality of speech, and of course facial and bodily gesture. This is especially true among groups of young single people in which hormones are flying, and flirtation, coyness, and jealousy create a symphony of psychic vectors…

At There.com, I designed the initial avatar gaze system. With the help of Chuck Clanton, I created an “intimacam”, which aimed perpendicular to the consensual gaze of the avatars, and zoomed-in closer when the avatar heads came closer to each other.

The greatest animators have known about the power of gaze for as long as the craft has existed. This highly-social component of body language has a mathematical manifestation in the virtual spaces of cartoons, computer games, and virtual worlds. And it is one of the many elements that will become refined and codified and included into the virtual body language of the internet.

Human communication is migrating over to the internet – the geo-cortex of posthumanity. Text is leading the way. Body language has some catching up to do. Brian Rotman has some interesting things to say along these lines in his book, Becoming Beside Ourselves.

We can learn a lot from Pixar animators, as well as psychologists and actors, as we develop virtual worlds and collaborative workspaces.


In response to my earlier post, Laban-for-animators expert Leslie Bishko made this comment:

“My .2c – breath promotes the illusion of sentience, gaze promotes the illusion of interaction and relationship!”

“Consider Including” Google Stupidity and Arrogance

November 12, 2011

A little off-topic here, but I just can’t resist taking another jab at The Google.

I am a gmail user, but more recently I have considered switching.

Every so often, I notice a new gmail feature. Google is usually kind enough to let me know that a new feature has been introduced, such as offering me the option to try the “new look”, although after I say “no thank you” which I always do, I keep getting notifications to try the “new look”, even though I had already said “no thank you” to the “new look”. Thanks Google, but please STOP TELLING ME ABOUT YOUR “NEW LOOK”.

And then there is the little yellow “Important” symbol that one day magically appeared next to some of my messages. When I roll over the symbol I see the text, “Important mainly because of the people in the conversation”.

Yo Google: how ’bout if I decide what’s important.

One person in the Google forums complained about gmail tagging her message as: “Important mainly because of the words in the message”. She says, “Can we stop with the idiotic messages from Google, as if our paternalistic uncle was looking out for us?”


But that’s not what I want to talk about: I want to talk about a feature which is the ultimate example of Google developers trying to be oh so clever but just coming across as stupid. I’m talking about the text that appears when I’m composing an email to someone, which says, “Consider including: John, Rebecca…” And so on.

Peter Thomas, one of the many bloggers who has complained about this ridiculous feature, summarizes it:

“When you type an e-mail, Gmail comes up with a list of people that you may like to also copy it to. Let’s pause and just think about this. You are writing an e-mail, generally the first thing that you do is to type in the address of the person (or people) you are writing to. Gmail has a useful feature that scans your previous mails, so typing “Pe” will bring up “Peter Thomas” as an option. So far so good….

…but then, gmail offers a list of people that you may consider including as recipients of your email, based on simple association. Hello? What if I am emailing a colleague to complain about the boss? I certainly don’t want to include the boss, and it scares me that his name is sitting up there, a mouse-click away from disaster. Or what if I am plotting a surprise birthday party for Beth? Including Beth is specifically NOT what I want to do.

And…what if the person is DEAD?

I found this on the Google forums:

I deleted my dead friend as a contact which was traumatic enough, but having google STILL suggesting I include her when there’s honestly nothing I’d like better than to be able to include her BECAUSE SHE’S DEAD.  How do I make this stop?!?!?!

Note to Google:

Please get out of the business of reading our minds. You suck at it.

Peter Thomas concludes: “This “feature” is bad enough to have merited me writing to Google asking them to remove it, or at least make it optional. Their support forums are full of people saying the same. It will be interesting to see whether or not they listen.”

Do a search for “consider including”, and you’ll come across several people railing against this act of stupidity from Google. My blog post is not original. Yet I feel compelled to add another voice to the chorus.

Do I have any conclusions or insights? Not really, other than my opinion that any good thing can turn bad when it gets too big and too powerful. Google is generally a good thing. But I think Google is getting too big and too powerful. And I am getting smaller and less powerful, in relative terms. I want to be completely in charge of how I communicate with my friends and colleagues.

The fact that Google is brimming with young, clever, cocky geeks does not make for an agreeable form of world domination.