There is no English translation for this web page.
|Montag, 6. Juli
2009, TU Hochhaus, Auditorium 1, 20. Etage|
Vortrag findet statt im Rahmen des Research Colloquium Usability.|
conversion (VC) has the aim to transform the voice of one speaker in
such a way that the converted voice sounds as if it was uttered by
another speaker. The meaning and content of the speech are not
changed. Nowadays, many applications for the VCtask exist. An
important application is a customised text-to-speech (TTS-) system
which gives the ability to build corporate identities quickly and
inexpensively by modifying the underlying speech corpus of the
TTS-system and thus the sound of the voice. VC can also be used to
create special characters’ voices for the movie industry or to
“keep” the voice of an actor in different languages. The latter
case aims to retain the speaker’s identity in speech-to-speech
However, many VC-systems suffer from a poor naturalness and quality of the transformed voice. The transformed voice can only sound naturally, if it includes all characteristics relevant for the true target speaker. Within VC-systems, a main problem is the mapping of the prosody which is one of the essential features.
This talk reviews the basic concept of voice conversion from its beginnings in 1984 till today. It points out the advantages and disadvantages of the different approaches and shows improvements by modelling the prosody. Two approaches to model the prosody are presented and discussed with respect to the voice-conversion task.
Jan Schwarz is working as a
research assistant (Wissenschaftlicher Mitarbeiter) at the Institute
for Circuit and System Theory (LNS) at the
Christian-Albrechts-University of Kiel, Germany, since September 2005.
He studied Electrical Engineering and Information Science and received
his diploma degree in July 2005 from the Ruhr-University Bochum,
Germany. At the LNS, Jan is writing his PhD (Dr.-Ing.) thesis in the
domain of digital speech signal processing, especially "voice
conversion". He develops a speech-synthesis system that uses
harmonic coders to convert the voice of one person into the voice of
another person. The aim is to modify the voice without changing the
meaning or content of the speech.