direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Reviewed Journal Papers

go back to overview

Quality Prediction of Synthesized Speech based on Perceptual Quality Dimensions
Citation key norrenbrock2015a
Author Norrenbrock, Christoph Ritter and Hinterleitner, Florian and Heute, Ulrich and Möller, Sebastian
Pages 17–35
Year 2015
ISSN 0167-6393
DOI 10.1016/j.specom.2014.06.003
Address New York, USA
Journal Speech Communication
Volume 66
Month feb
Note print/online
Publisher Elsevier
How Published full
Abstract Instrumental speech-quality prediction for text-to-speech signals is explored in a twofold manner. First, the perceptual quality space of TTS is structured by means of three perceptual quality dimensions which are derived from multiple auditory tests. Second, quality-prediction models are evaluated for each dimension using prosodic and MFCC-based measurands. Linear and nonlinear model types are compared under cross-validation restrictions, giving detailed insight into model-generalizability aspects. Perceptually regularized properties, denoted as quality elements, are introduced in order to encode the quality-indicative effect of individual signal characteristics. These elements integrate a perceptual model reference which is derived in a semi-supervised fashion from natural and synthetic speech. The results highlight the feasibility of instrumental quality prediction for TTS signals provided that broad training material is employed. High prediction accuracy, however, requires nonlinear model structures.
Link to original publication Download Bibtex entry

go back to overview

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions