Reviewed Conference Papers

Is Intelligibility Still the Main Problem? A Review of Perceptual Quality Dimensions of Synthetic Speech
Citation key hinterleitner2013a
Author Hinterleitner, Florian and Norrenbrock, Christoph and Möller, Sebastian
Title of Book 8th ISCA Speech Synthesis Workshop
Pages 167–171
Year 2013
Location Barcelona, Spain
Month aug
Abstract In this paper, we present a comparative overview of 9 studies on perceptual quality dimensions of synthetic speech. Differ-ent subjective assessment techniques have been used to evalu-ate the text-to-speech (TTS) stimuli in each of these tests: in a semantic differential, the test participants rate every stimulus on a given set of rating scales, while in a paired comparison test, the subjects rate the similarity of pairs of stimuli. Percep-tual quality dimensions can be derived from the results of both test methods, either by performing a factor analysis or via mul-tidimensional scaling. We show that even though the 9 tests differ in terms of used synthesizer types, stimulus duration, lan-guage, and quality assessment methods, the resulting perceptual quality dimensions can be linked to 5 universal quality dimen-sions of synthetic speech: (i) naturalness of voice, (ii) prosodic quality, (iii) fluency and intelligibility, (iv) disturbances, and (v) calmness
