Voice as Puppeteer

May 5, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————–

According to Gestural Theory, verbal language emerged from the primal energy of the body, from physical and vocal gestures.

url

The human mind is at home in a world of abstract symbols – a virtual world separated from the gestural origins of those symbols. An evolution from the analog to the digital continues today with the flood of the internet over earth’s geocortex. Our thoughts are awash in the alphabet: a digital artifact that arose from a gestural past. It’s hard to imagine that the mind could have created the concepts of Self, God, Logic, and Math: belief structures so deep in our wiring – generated over millions of years of genetic, cultural, and neural evolution. I’m not even sure if I fully believe that these structures are non-eternal and human-fabricated. Since the Copernican Revolution yanked humans out from the center of the universe, it continues to progressively kick down the pedestals of hubris. But, being humans, we cannot stop this trajectory of virtuality, even as we become more aware of it as such.

I’ve observed something about the birth of online virtual worlds, and the foundational technologies involved. One of the earliest online virtual worlds was Onlive Traveler, which used realtime voice.

onlive1

My colleague, Steve DiPaola invented some techniques for Traveler which cause the voice to animate the floating faces that served as avatars.

But as online virtual worlds started to proliferate, they incorporated the technology of chat rooms – textual conversations. One quirky side-effect of this was the collision of computergraphical humanoid 3D models with text-chat. These are strange bedfellows indeed – occupying vastly different cognitive dimensions.

chat_avatars

Many of us worked our craft to make these bedfellows not so strange, such as the techniques that I invented with Chuck Clanton at There.com, called Avatar Centric Communication.

Later, voice was introduced to There.com. I invented a technique for There.com voice chat, and later re-implemented a variation for Second Life, for voice-triggered gesticulation.

Imagine the uncanny valley of hearing real voices coming from avatars with no associated animation. When I first witnessed this in a demo, the avatars came across as propped-up corpses with telephone speakers attached to their heads. Being so tuned-in to body language as I am, I got up on the gesticulation soap box and started a campaign to add voice-triggered animation. As an added visual aid, I created the sound wave animation that appears above avatar heads for both There and SL…

waves

Gesticulation is the physical-visual counterpart to vocal energy – we gesticulate when we speak – moving our eyebrows, head, hands, etc. – and it’s almost entirely unconscious. Since humans are so verbally-oriented, and since we expect our bodies to produce natural body language to correspond to our spoken communications, we should expect the same of our avatars. This is the rationale for avatar gesticulation.

I think that a new form of puppeteering is on the horizon. It will use the voice. And it won’t just take sound signal amplitudes as input, as I did with voice-triggered gesticulation. It will parse the actual words and generate gestural emblems as well as gesticulations. And just as we will be able to layer filters onto our voices to mask our identities or role-play as certain characters, we will also be able to filter our body language to mimic the physical idiolects of Egyptians, Native Americans, Sicilians, four-year-old Chinese girls, and 90-year old Ethiopian men.

Digital-alphabetic-technological humanity reaches down to the gestural underbelly and invokes the primal energy of communication. It’s a reversal of the gesture-to-words vector of Gestural Theory.

And it’s the only choice we have for transmitting natural language over the geocortex, because we are sitting on top of a thousands-year-old heap of alphabetic evolution.


Seven Hundred Puppet Strings

March 31, 2012

(This blog post is re-published from an earlier blog of mine called “avatar puppetry” – the nonverbal internet. I’ll be phasing out that earlier blog, so I’m migrating a few of those earlier posts here before I trash it).

———————–
The human body has about seven hundred muscles. Some of them are in the digestive tract, and make their living by pushing food along from sphincter to sphincter. Yum! These muscles are part of the autonomic nervous system.

Other muscles are in charge of holding the head upright while walking. Others are in charge of furrowing the brow when a situation calls for worry. The majority of these muscles are controlled without conscious effort. Even when we do make a conscious movement (like waving a hand at Bonnie), the many arm muscles involved just do the right thing without our having to think about what each muscle is doing. The command region of the brain says, “wave at Bonnie”, and everything just happens like magic. Unless Bonnie scowls and looks the other way, in which case, the brow furrows, and is sometimes accompanied by grumbling vocalizations.

The avatar equivalent of unconscious muscle control is a pile of procedural software and animation scripts that are designed to “do the right thing” when the human avatar controller makes a high-level command, like <walk>, or <do_the_coy_shoulder_move>, or <wave_at, “Bonnie”>. Sometimes, an avatar controller might want to get a little more nuanced: <walk_like, “Alfred Hitchcock”>; <wave_wildly_at, “Bonnie”>. I have pontificated about the art of puppeteering avatars in the following two web sites:

www.Avatology.com
www.AvatarPuppeteering.com

Also this interview with me by Andrea Romeo discusses some of the ideas about avatar puppetry that he and I have been bantering around for about a year now.

The question of how much control to apply on your virtual self has been rolling around in my head ever since I started writing avatar code for There.com and Second Life. Avatar control code is like a complex marionette system, where every “muscle” of the avatar has a string attached to it. But instead of all strings having equal importance, these strings are arranged in a hierarchical structure.

The avatar controller may not necessarily want or need to have access to every muscle’s puppet string. The question is: which puppet strings do the avatar controller want to control at any given time, and…how?

I’ve been thinking about how to make a system that allows a user to shift up and down the hierarchy, in the same way that our brains shift focus among different motion regimes

MOTION-CAPTURE ALONE WILL NOT PROVIDE THE NECESSARY INPUTS FOR VIRTUAL BODY LANGUAGE.

The movements – communicative and otherwise – that our future avatars make in virtual spaces may be partially generated through live motion-capture, but in most cases, there will be substitutions, modifications, and deconstructions of direct motion capture. Brian Rotman sez:

“Motion capture technology, then, allows the communicational, instrumental, and affective traffic of the body in all its movements, openings, tensings, foldings, and rhythms into the orbit of “writing”.

Becoming Beside Ourselves, page 47

Thus, body language will be alphabetized and textified for efficient traversal across the geocortex. This will give us the semantic knobs needed to puppeteer our virtual selves – at a distance. And to engage the semiotic process.

If I need my avatar to run up a hill to watch out for a hovercraft, or to walk into the next room to attend another business meeting, I don’t want to have to literally ambulate here in my tiny apartment to generate this movement in my avatar. I would be slamming myself against the walls and waking up the neighbors. The answer to generating the full repertoire of avatar behavior is hierarchical puppeteering. And on many levels. I may want my facial expressions, head movements, and hand movements to be captured while explaining something to my colleagues in remote places, but when I have to take a bio-break, or cough, or sneeze, I’ll not want that to be broadcast over the geocortex

And I expect the avatar code to do my virtual breathing for me.

And when my avatar eats ravioli, I will want its virtual digestive tract to just do its thing, and make a little avatar poop when it’s done digesting. These autonomic inner workings are best left to code. Everything else should have a string, and these strings should be clustered in many combinations for me to tug at many different semantic levels. I call this Hierarchical Puppetry.

Here’s a journal article I wrote called Hierarchical Puppetry.


Screensharing: Don’t Look at Me

January 11, 2012

Imagine discussing a project you are doing with a small group: a web site, a drawing, a contraption you are building; whatever. You would not expect the people to be looking at your face the whole time. Much of the time you will all be gazing around at different parts of the project. You may be pointing your fingers around, using terms like “this”, “that”, “here” and “there”.

When people have their focus on something separate from their own bodies, that thing becomes an extension of their bodies. Bodymind is not bound by skin. And collaborating, communicating bodyminds meld on an object of common interest.

TeleKinesics

The internet is dispersing our workspaces globally, and the same is happening to our bodies.

The anthropologist, Ray Birdwhistell coined the term “kinesics“, referring to the interpretation, science, or study of body language.

I invented a word: “telekinesics”. I define it as, “the science of body language as conducted over remote distances via some medium, including the internet” (ref)

My primary interest is the creation of body langage using remote manifestations of ourselves, such as with avatars and other visual-interactive forms. I don’t consider video conferencing as a form of virtual body language, because it is essentially a re-creation of one’s literal appearances and sounds. It is an extension of telephony.

But it is virtual in one sense: it is remote from your real body.

Video conferencing, and applications like Skype are extremely useful. I use Skype all the time to chat with friends or colleagues. Seeing my collaborator’s face helps tremendously to fill-in the missing nonverbal signals in telephony. But if the subject of conversation is a project we are working on, then “face-time”, is not helpful. We need to enter into, and embody, the space of our collaboration.

Screen Sharing

This is why screen sharing is so useful. Screen sharing happens when you flip a switch on your Skype (or whatever) application that changes the output signal from your camera to your computer screen. Your mouse cursor becomes a tiny Vanna White – annotating, referencing, directing people’s gazes.

Michael Braun, in the blog post: Screen Sharing for Face Time, says that seeing your chat partner is not always helpful, while screen sharing “has been shown to increase productivity. When remote participants had access to a shared workspace (for example, seeing the same spreadsheet or computer program), then their productivity improved. This is not especially surprising to anyone who has tried to give someone computer help over the phone. Not being able to see that person’s screen can be maddening, because the person needing help has to describe everything and the person giving help has to reconstruct the problem in her mind.”

Many software applications include cute features like collaborative drawing spaces, intended for co-collaborators to co-create, co-communicate, and to to co-mess up each other’s co-work. The interaction design (from what I’ve seen) is generally awkward. But more to the point: we don’t yet have a good sense of how people can and should interact in such collaborative virtual spaces. The technology is still frothing like tadpole eggs.

Some proponents of gestural theory believe that one reason speech emerged out of gestural communication was because it freed up the “talking hands” so that they could do physical work – so our mouths started to do the talking. Result: we can put our hands to work, look at our work, and talk about it, all at the same time.

Screen sharing may be a natural evolutionary trend – a continuing thread to this ancient  activity – as manifested in the virtual world of internet communications.

 

 


Can You Trust Email Body Language?

December 2, 2011

Steve Tobak wrote an article in CBSNews.com called, How to Read Virtual Body Language in Email.

Steve makes some interesting observations. But, like so many attempts at teaching us “how to read” body language, Steve makes several assumptions that miss the highly contextual, and highly tenuous nature of interpreting emotion via email.

In fact, email is often used by people as a way to avoid emotion or intimacy. It’s an example of asynchronous communication: an email message could take an arbitrary amount of time to compose, and it could be sent at an arbitrary time after writing it. Thus, email is not a reliable medium for reading one’s emotions. It’s hard to lie with your body. It’s much easier to lie with a virtual body. With email, you don’t even have a body.

Damn That Send Button

Actually, I wish I could say that I have always used email in a premeditative, calculated way. I have been guilty of sending email messages in the heat of an emotional moment. A few too many of those emails have lead me to believe that the SEND button should be kept in a locked box in a governmental facility. And the box should have a big sign that says, Are You Sure?

People often make the mistake of assuming that a given communication medium provides a transparent channel for human expression. Oddly enough: email can bring out certain negative qualities in people who may not be negative in normal face-to-face encounters.

People don’t take into account the McLuhan effect, and assume the message is determined only by the communicators. Steve says this about flame mail:

“You probably don’t need me to tell you this, but when you receive what we affectionately call flame mail – where someone lets loose on you in a big, ugly way – that’s aggressive behavior. In other words, they’re acting out like a child throwing a temper tantrum and it’s not about you, it’s about them. I know it’s tempting to think it’s just a misunderstanding, but ask yourself, why did they assume the worst?”

But it’s not just “about them”. It’s also about the medium – an awkward, body-language-challenged medium.

Also, people can feel “safe” behind the email wall (meaning they know they won’t get punched in the face – at least not immediately).  There’s something about the medium that can cause people to flame – EVEN if they are not normally flame-throwers. Jaron Lanier in You Are Not a Gadget gives a good explanation for how and why this phenomenon occurs. Read the book, even if you don’t always take Jaron seriously. He is brave and bold, and he challenges many assumptions about internet culture.

Everyone has stories about email messages they wish they had never written, or email messages they wish they had never read.

It’s wise to understand how media mediates our interactions with each other. That is an important kind of literacy: a literacy of understanding media effects.