Investigating perceptual dimensions of social speaker characteristics in Neural TTS voices

Location:  Zoom link (Please ask Saman Zadtootaghaj for access)  

Date/Time: 18.01.2021, 14:00-14:30 

SPEAKER:  Abhinav Bhardwaj (TU Berlin)

Abstract: Synthesized speech has been gaining increasing traction in new technologies in the fields of healthcare, education, telecommunication, etc. Therefore, it is critical to understand whether these sets of speech include social speaker characteristics. With increasing advancements in speech synthesis, extensive research is still needed regarding the evaluation of synthesizers developed for different application domains. Through this research study, we aim to investigate different perceptual dimensions contributing to the social speaker characteristics and attributes in generated synthesized speech using Text-to-Speech (TTS) systems. Considering the wide range of TTS system applications, we have narrowed down our research to healthcare and customer services application domains. 

We perform a crowd source based evaluation on synthesized speech generated through selective TTS systems. And, after performing a factor analysis we analyze that the social speaker characteristics can be perceived in the synthesized voices such as "warmth" and "competence" along with some other characteristics. We also investigated how extra-linguistic features affecting the perceptual dimensions of social speaker characteristics.



