Gastvorträge 2009

Improvements in voice conversion using prosodic models
Jan Schwarz

Montag, 6. Juli 2009, TU Hochhaus, Auditorium 1, 20. Etage
Der Vortrag findet statt im Rahmen des Research Colloquium Usability.


Voice conversion (VC) has the aim to transform the voice of one speaker in such a way that the converted voice sounds as if it was uttered by another speaker. The meaning and content of the speech are not changed. Nowadays, many applications for the VCtask exist. An important application is a customised text-to-speech (TTS-) system which gives the ability to build corporate identities quickly and inexpensively by modifying the underlying speech corpus of the TTS-system and thus the sound of the voice. VC can also be used to create special characters’ voices for the movie industry or to “keep” the voice of an actor in different languages. The latter case aims to retain the speaker’s identity in speech-to-speech translation-scenarios.
However, many VC-systems suffer from a poor naturalness and quality of the transformed voice. The transformed voice can only sound naturally, if it includes all characteristics relevant for the true target speaker. Within VC-systems, a main problem is the mapping of the prosody which is one of the essential features.
This talk reviews the basic concept of voice conversion from its beginnings in 1984 till today. It points out the advantages and disadvantages of the different approaches and shows improvements by modelling the prosody. Two approaches to model the prosody are presented and discussed with respect to the voice-conversion task.

Short Biography

Jan Schwarz is working as a research assistant (Wissenschaftlicher Mitarbeiter) at the Institute for Circuit and System Theory (LNS) at the Christian-Albrechts-University of Kiel, Germany, since September 2005. He studied Electrical Engineering and Information Science and received his diploma degree in July 2005 from the Ruhr-University Bochum, Germany. At the LNS, Jan is writing his PhD (Dr.-Ing.) thesis in the domain of digital speech signal processing, especially "voice conversion". He develops a speech-synthesis system that uses harmonic coders to convert the voice of one person into the voice of another person. The aim is to modify the voice without changing the meaning or content of the speech.


Georg Essl, georg.essl(AT)telekom.de

