Tanvi Dinkar is a PhD student at Télécom Paris, Insitut Polytechnique de Paris, and a Marie Curie ITN fellow at ANIMATAS. She is supervised by Prof. Chloé Clavel and co-supervised by Prof. Ioana Vasilescu and Prof. Catherine Pelachaud. Her PhD studies the representations of disfluencies for SLU. Her research interests include SLU, psycholinguistics, communicative strategies and the discrepancies between the way that people speak versus the way that people write. Prior to this, she was a dialogue engineer at Nuance (now Microsoft), coding dialogue systems for the automotive industry. She decided to pursue research in SLU when she saw from customer tickets that the task oriented dialogue systems are not robust to people speaking naturally. She has two masters from the University of Edinburgh, one in Linguistics and one in Speech and Language Processing. Once upon a time, she completed an undergraduate degree in Journalism and Literature.
People rarely speak in the same manner that they write – they are generally disfluent. Disfluencies can be defined as interruptions in the regular flow of speech, such as pausing silently, repeating words, or interrupting oneself to correct something said previously. Fillers are a disfluency, which is a sound (e.g. “um” in English) filling a pause in an utterance. Clark and Fox Tree (2002) proposed that speaker’s are able to utilise fillers as collateral signals in communication, in addition to the primary signal of the message (the lexical, or “what was said (in essence)”). So far, research in Spoken Dialogue Systems (SDS) has focused on the primary message, where the goal of intent classification and slot detection is to reduce the input utterance into a semantic frame (Louvan and Magnini, 2020). Despite the rich linguistic literature that shows their informativeness; traditionally fillers and other disfluencies are typically removed as noise from the output transcripts of Speech Recognition systems. The aim of my thesis is to study the representations of fillers and discourse markers in spoken language understanding (SLU), inspired by psycholinguistic models of listener comprehension. The focus will be on studying representations that have both impact on broader tasks in SLU, and on research relevant to SDS. In this presentation, I will discuss two recent works that studied the representations of fillers in state-of-the-art language models, and the role of fillers and discourse markers in mutual understanding.
Herbert H. Clark and Jean E. Fox Tree. 2002. Using uh and um in Spontaneous Speaking. Cognition 84(1):73– 111. https://doi.org/https://doi.org/10.1016/S0010-0277(02)00017-3. Samuel Louvan and Bernardo Magnini. 2020. Recent neural methods on slot filling and intent classification for task-oriented dialogue systems: A survey. arXiv preprint arXiv:2011.00564.
Papers covered during the talk
- Tanvi Dinkar, Pierre Colombo, Matthieu Labeau, and Chloé Clavel. “The importance of fillers for text representations of speech transcripts.” arXiv preprint arXiv:2009.11340 (2020).
- Utku Norman, Tanvi Dinkar, Barbara Bruno, and Chloé Clavel. “Studying Alignment in Spontaneous Speech via Automatic Methods: How Do Children Use Task-specific Referents to Succeed in a Collaborative Learning Activity?.” arXiv preprint arXiv:2104.04429 (2021).