Inhalt des Dokuments
Vorhersage der Qualität synthetischer Sprache mit Hilfe eines Spracherkenners
LOCATION: Auditorium 1, TEL, Ernst-Reuter-Platz 7, 20th floor
Date/Time: 04.05.2015, 14:15-15:00
SPEAKER: Steffen Zander (TU Berlin)
In this Thesis we investigated the use of an automatic speech recognizer (Google Speech API and Sphinx Speech Recognizer) for the prediction of quality and intelligibility of synthetic speech. For 4 databases of rated synthetic speech samples, we analyzed the correlation of the word error rates (WER) obtained from the recognizer for each sample with ratings on 16 different attribute scales. Moderate correlations were observed for various quality aspects including overall impression, naturalnesss, and intelligibililty. Moreover, we analyzed in a fifth database the correlation between intelligibility by a human, as determined in a test with semantically unpredictable sentences, and the WER of the recognizer. The correlation between the humans’ and the recognizer’s WER over all samples is .40, and .94 if averaged by TTS system.