Profile
I am currently a specially-appointed associate professor at Kyoto University. I am involved in Moonshot, a long-term research project to transform society by 2050. The focus of my work is to create a semi-autonomous conversational agent which accomplishes simple conversation tasks but can also recognize more difficult tasks and then hand control to a remote human operator. The operator should be able to control multiple avatars, enabling parallel conversations and hopefully improving task efficiency.
I am also interested in making conversational agents more human-like through non-linguistic behaviors such as backchannels, turn-taking and laughter. We have created behaviors for an attentive listening agent to try and allow them to show empathy towards the user. This listening agent has been used in several robots and virtual agents.
I have made dialogue systems for several robots and virtual agents.
Conversational roles:
We develop robots for a number of human roles, each with different conversational requirements:
- Attentive listening: Listening to the user while producing backchannels and repeated responses.
- Job Interviewer: Extracting of keywords from the interviewed user to produce appropriate follow-up questions
- Lab Guide: Simple question-answering system to introduce users to our lab
- Speed Dating: Mixed-initiative conversation
- Wikitalk: Users asks a robot about topics on Wikipedia
Conversational models:
We have created interaction models for human-like conversational abilities. Unlike chatbots or virtual assistants such as Google Home or Siri, these models are particularly needed for situated conversational robots.
Shared laughter
We recently developed a shared laughter model to predict when, if and how a robot should laugh in response to the user's laugh. This model allows ERICA and other robots to laugh along with the user. Our paper was featured in several international media outlets and science magazines including:
Turntaking
Turntaking models predict if the user has finished their conversational turn, using the acoustic signal of their speech and the lexical output. Previously we trained models which use the acoustic signal and the word to predict the end of a turn. This model significantly outperforms the baseline and is further improved when used together with a finite-state turn-taking machine.
We are now experimenting with more powerful transformer models which only need a continuous acoustic signal.
Engagement
Our engagement model recognizes several types of user behavior and produces a likelihood that they are engaged in the conversation. With this information we are then able to manage a robot's behavior by deciding whether or not it should continue in the current topic or move to something else.
Attentive Listening
For realistic attentive listening, we created a backchannel model which identifies appropriate timing of Japanese backchannels (相槌). This model is used when ERICA is listening to the user. Multimodal backchannels (e.g. head nodding) can also be used by ERICA. We also developed a statement response model for attentive listening which identifies a focus word in the user's speech and then produces an appropriate response using that focus word. When used together with the backchannel model, ERICA can take the role of an attentive listener, and stimulate the user to continue talking. The system is domain-independent, so the user can talk about any topic.
Our public symposium with ERICA has been featured in Japanese media! Links are below (Japanese only):
lala@sap.ist.i.kyoto-u.ac.jp