Taras Kucherenko is a Ph.D. student at Robotics, Perception and Learning lab at KTH Royal Institute of Technology in Stockholm. His current research is about building generative models for non-verbal behavior generation to enable social agents to use body language as an additional communication tool. He received his MSc in machine learning at RWTH Aachen with an emphasis in Natural Language Processing. His BSc was in applied math at KPI, Kyiv with an emphasis in Cryptography.
Conversational agents in the form of virtual agents or social robots are rapidly becoming wide-spread. Humans use non-verbal behaviors to signal their intent, emotions, and attitudes in human-human interactions. Conversational agents, therefore, need this ability as well in order to make interaction pleasant and efficient. An important part of non-verbal communication is gesticulation: gestures communicate a large share of non-verbal content. The task of gesture generation has been studied extensively over the last few decades. Initially, most of the methods were rule-based, but recent state-of-the-art methods are data-driven, and we continue this line of research. While previous data-driven methods for gesture generation mainly used only one modality of the speech: either the audio or the text, our model takes both speech semantics and speech acoustics into account, and produces body motion as a sequence of joint angle rotations. This makes it possible to generate arbitrary gestures; not only beat gestures that align with the rhythm of the speech, but also gestures that depend on semantic content. The resulting gestures can be applied to both virtual agents and humanoid robots. Subjective and objective evaluations confirm the success of our approach.