Inhalt des Dokuments
Investigating the adaption of a Turn-Taking Prediction Model to Multiparty Spoken Dialogue using Long Short-Term Memory Recurrent Neural Networks
Location: TEL, Room Auditorium 3 (20th floor), Ernst-Reuter-Platz 7, 10587 Berlin
Date/Time: 02.03.2020, 14:15-15:00
SPEAKER: Schokri Ben Mustapha
Abstract: When to respond to user-input is a crucial task for spoken dialogue systems and conversational agents. To resolve this problem, previous models have generally followed the paradigm of classifying non-speech regions as either turn-shifts or turn-holds. This work proposes a computational model for turn-taking that provides a more general and pro-active approach. Instead of training a classifier only on information content preceding turn-taking events such as holds and shifts the proposed model is trained on continuous dialogue data to predict future speech activity of participants for a specific time window. This approach has been introduced in previous studies using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) and achieved promising results on dyadic dialogue. This work will investigate the adaption of this model using multiparty dialogue data and analyse the usefulness of eye-gaze information for such a model. Experiments on two tasks - predictions at pauses and overlaps - have shown that this approach can achieve better results than a baseline. Individual models trained on eye-gaze and acoustic information performed equally well on the dialogue data. However, combining both modalities did not improve the performance.