TU Berlin

Quality and Usability Lab2015_05_04_Zander

Inhalt des Dokuments

zur Navigation

Vorhersage der Qualität synthetischer Sprache mit Hilfe eines Spracherkenners

LOCATION: Auditorium 1, TEL, Ernst-Reuter-Platz 7, 20th floor

Date/Time: 04.05.2015, 14:15-15:00

SPEAKER: Steffen Zander (TU Berlin)


In this Thesis we investigated the use of an automatic speech recognizer (Google Speech API and Sphinx Speech Recognizer) for the prediction of quality and intelligibility of synthetic speech. For 4 databases of rated synthetic speech samples, we analyzed the correlation of the word error rates (WER) obtained from the recognizer for each sample with ratings on 16 different attribute scales. Moderate correlations were observed for various quality aspects including overall impression, naturalnesss, and intelligibililty. Moreover, we analyzed in a fifth database the correlation between intelligibility by a human, as determined in a test with semantically unpredictable sentences, and the WER of the recognizer. The correlation between the humans’ and the recognizer’s WER over all samples is .40, and .94 if averaged by TTS system.




Schnellnavigation zur Seite über Nummerneingabe