An Interview with Bob Moore

December 31, 2010

In the winter of 2009, I interviewed Bob Moore by email. Bob is a Senior Research Scientist at Yahoo! Labs. His researchfocuses on social interaction in online and offline environments. In the past, he designed virtual worlds at The Multiverse Network and studied them at the Palo Alto Research Center (PARC) where he started the PlayOn project. I met Bob when he was working at PARC. He has spent quite a bit of his life studying the ways people interact.

But first, I want to show you a picture that Bob gave me of a rather odd interaction….

Below are my questions to Bob.

Jeffrey Ventrella: What do you think about the difference between 3D virtual worlds that use text chat vs. virtual worlds that use voice? What are the pros and cons? Can the two get along in a single virtual world?

Bob Moore: I think voice and text definitely can get along in the same virtual world just as they do in the real world. When I’m in a meeting at work, I text my wife rather than call her. It’s non-intrusive in that situation. Or when I’m watching a movie, I might tweet my friends or the world and see what others think of the movie. It’s more convenient than phoning them all!

Similarly with virtual worlds, there are plenty of situations in which text is better than voice: when I want to throw out a clever comment to the whole room (or server) and see who else thinks it’s clever; when I’m sitting with my wife on the couch and she’s watching TV; when I’m juggling my virtual conversation with real life activities like watching the kids (because the turns of the conversation persist); when I’m approaching strangers online in an attempt to make friends; when I don’t want my virtual friend to know that I’m older or more male than my character; and the list goes on. Text chat is really useful sometimes. The fact that users did not abandon text chat when voice was integrated into worlds like Second Life and World of Warcraft demonstrates this.

In contrast, voice is good for socializing with friends, especially when you already know them; for coordinating large groups, such as raids; for giving and attending lectures or panel discussions; and more.

However, the fact that users can sometimes switch to voice when text chat is inconvenient, does not mean that developers should stop trying to improve text chat. Currently in most virtual worlds, it is difficult to coordinate one’s voice-as-text with one’s avatar and with others’ avatars. The avatars move in real time, but the typed words do not. Speaking-by-typing is hidden from public view, with utterances emerging fully formed. The interactional result is that one’s friend will walk away while one is still speaking, or one’s teammate will attack when one is still discussing tactics [4][3][2]. To overcome these kinds of coordination problems, players have developed workarounds like the “readiness check” [2] and the adoption of third-party voice applications. However, a better solution has been demonstrated by There since its inception. Namely, post text chat on a word-by-word basis so that text utterances can be monitored by others as they unfold in real time. More recently, GoogleWave also implements text chat in this way no doubt because its users must coordinate their chat with real-time activities such as editing documents or viewing videos.

Jeffrey: In your paper, “When Names Fail”, you analyze small, subtle gestures, and short-lived referential movements. In order for a virtual world to provide a read for this resolution of behavior, what kinds of animation frame-rates are needed? Do you have any thoughts on frame-rate and body language?

Bob: It’s not a matter of frame rates or resolution. It’s a matter of degrees of freedom and control. Currently virtual worlds implement certain kinds of gestures decently but have not implemented others at all. Stand-alone gestures (or “emblems”), such as a wave, work fairly well in virtual worlds currently because you do not need to coordinate them with your utterances or with other objects, although you do need to coordinate them with the attention of the recipient. You activate the wave command, your avatar does a canned wave and your recipient hopefully can see it and infer that you have just greeted him or bid him goodbye.

However, pointing (or “deictic”) gestures are trickier. If you activate the “point” command, your avatar performs a canned point, and your recipient wonders what you are pointing at and why. In order for the point to be meaningful, it needs to be coordinated not only with the recipient’s attention but also with the object pointed to and the utterance about it. For example, if I say, “let’s talk to her,” I need to time the pointing gesture reasonably with the word “her” and point my avatar’s finger in her direction, as well make sure that you saw the point and its referent. This is currently very difficult in virtual worlds. So users don’t point. They use names instead.

Finally, depictive (or “iconic”) gestures are currently not even possible in virtual worlds. Not only do they involve significant coordination issues like pointing gestures, they cannot even be canned. For example, if I say, “I talked to the girl with the hair” and as I say “hair” I hold two cupped hands several inches over both sides of the top of my head, I am using an idiosyncratic configuration of my body to depict a hairdo with two large buns. Currently there is no way to make your avatar do this because gestures are canned. This kind of idiosyncratic, creative, free-form use of the body for depicting things requires free gesticulation, the ability to move the avatar in anyway you want, on the fly. [4][3][2]

Four Gesture Types

Imagine if you could only chat with other users by selecting canned messages from a list. That is where virtual worlds currently are with gestures. It will not be until free gesticulation is implemented that avatars will be fully expressive with gestures. Some companies are experimenting with this kind of “real-time motion capture” for avatar control using cameras and infrared sensors.

Jeffrey: You’ve researched many online multiplayer games, like WOW. How do you see virtual world users generating social signaling within an environment that was not initially built for embodied communication? What creative solutions have users come up with?

Bob: In MMORPGs players have evolved shared practices to compensate for clunky avatar systems. For example, it is difficult to “point” to things, such as the opponent, or “mob,” you intend to attack. So in EverQuest players evolved the practice of “hailing” opponents. A player targets the opponent he or she intends to attack and hits the “h” key. This produces a text message reading something like this, “Aracorn hails an orc taskmaster!” which one’s teammates can see. Now an experienced teammate will not interpret this as Aracorn greeting the computer-controlled orc taskmaster, and perhaps making a joke. Rather he or she will interpret the hail as a proposal for the group to attack that particular orc next. This is much easier to accomplish in these virtual worlds than pointing and saying, “let’s attack him!” Pointing-by-hailing is part of EverQuest’s learned player culture.

A similar practice in MMORPGs might be called a “readiness check” [2]. Again when a team of players is coordinating an attack on an opponent, one part of the attacker’s, or “tank’s,” job is to wait until his or her fellow teammates are ready before initiating the attack. But this is not so easy to determine. Because these virtual worlds tend to lack public cues for a whole host of activities, such as consulting a map, browsing an inventory, composing a turn-at-chat, etc., it is impossible to tell if one’s teammates are ready simply by looking at them. So to work around this problem, the attacking player typically says to the group, “ready?” or simply “rdy?” and waits for the others to respond with “rdy”, “yes” or sometimes just “y”. While this solution works, it is cumbersome, and in fact players tend only to use it when the impending battle is likely to be a tough one. However, simple nonverbal cues can be used to display a player’s readiness to his or her teammates with no extra work and in a way that is more informative than a “readiness check” [2].

Jeffrey: You are exploring eye-tracking. Do you have ideas on the use of saccades – shifts in gaze – that can be used for social signaling in virtual worlds?

Bob: When eye-tracking becomes a standard input method on all personal computers, it will be very interesting for virtual worlds. Players will be able to make their avatars gaze at their fellow users’ avatars and objects in the virtual world in a natural way. However, the design challenge will be how to determine when the user is looking at the virtual 3D world, the 2D interface or the 3D real world. For example, it will be counter productive if a glance at the clock on one’s physical wall makes his or her avatar glance off into virtual space.

Jeffrey: What are your opinions on “away-from-keyboard” affordances? What is the best way to show that a user is away from his/her avatar?

Bob: There are a lot of clever and effective cues out there for signaling that a user is “AFK.” I think the bigger problem is that there are currently not enough cues for showing what users are doing when they are “at the keyboard.” If I am AFK, my avatar may droop its head or go to sleep or become semi-transparent, but if I’m consulting my map or browsing my inventory or chatting with someone privately, it does nothing. All of these activities are important features of the social interactional context, but currently developers treat them as purely private affairs. For example, if I say, “Shall we visit the Lars’ homestead?” and you begin consulting your map, this is informative to me. It tells me that you heard me and understood that I suggested going someplace together. It further tells me that you aren’t sure how to get there from here and may like me to lead the way. It further tells me that you are not ready to leave yet because you’re still looking at the map so I shouldn’t jump on my speeder and leave just yet.

Unfortunately, virtual worlds tend not to attach public cues to commands like “open map” and this causes unnecessary coordination problems in avatar interaction. Developers do not appear to recognize yet that in virtual worlds user interface design is also always social interaction design. In a virtual world, when users are interacting with the system, you never know if they may also be interacting with other users at the same time. If they are, then system feedback is not only informative to that individual user, it is also informative to the other users on the scene. It makes the user’s actions more intelligible to others and better enables them to coordinate basics actions with each other like traveling, attacking, chatting and much more. [4][3][2][1]

Jeffrey: What is the most problematic aspect of reconstituting body language in virtual worlds? Which natural language modality seems the hardest to virtualize?

Bob: Probably “free gesticulation”.

Jeffrey: Any other thoughts that you want to share about avatars, virtual embodiment, or body language in general?

Bob: In virtual worlds, social interaction problems can be solved in multiple ways. For example, which opponent a fellow player is referring to could be determined through an improved system of avatar gesture and gaze or through a 3D arrow that floats over the opponent’s head (like in EverQuest II) or through the arrangement of avatar portraits in the 2D user interface* or through the practice of “pointing-by-hailing” in EverQuest. From a social interaction perspective, it doesn’t really matter how you determine who your teammate is looking at, it just matters that you do. However, from a design perspective, I tend to prefer solutions to social interaction problems that use the avatars and use them in natural ways. This makes the avatars relevant and consequential, instead of merely pretty, and it makes the virtual experience feel more like a real face-to-face [3][2].

——–

*Like in “PartyAims,” an add-on for World of Warcraft that my colleague Eric Nickell created.

References

1. Moore, Robert J. (2008). “Designing For Player Sociability.” Austin Game Developers Conference. Austin, TX, September 14-17. http://www.gamasutra.com/php-bin/news_index.php?story=20240

2. Moore, Robert J., Cabell Gathman, Nicolas Ducheneaut, and Eric Nickell. (2007b): “Coordinating Joint Activity in Avatar-mediated Interaction.” In Proceedings of CHI 2007. New York: ACM. Pp. 21-30.

3. Moore, Robert J., Nicolas Ducheneaut, and Eric Nickell. (2007a): “Doing Virtually Nothing: Awareness and Accountability in Massively Multiplayer Online Worlds.” Computer Supported Cooperative Work 16:265-305.

4. Moore, R. J., Ducheneaut, N. and Nickell, E. (2005): “10 Things About Conversation in Virtual Worlds that Remind Me that I’m NOT in the Real World: Improving Interactional Realism in Massively Multiplayer Persistent Worlds.” Austin Game Conference, Austin, TX, October 28, 2005. http://blogs.parc.com/blog/2005/11/10-things-about-conversation-in-virtual-worlds/