Inhalt des Dokuments
Jan Schwarz |
---|
Montag, 6. Juli
2009, TU Hochhaus, Auditorium 1, 20. Etage |
Der
Vortrag findet statt im Rahmen des Research Colloquium Usability. [1] |
Abstract
Voice
conversion (VC) has the aim to transform the voice of one speaker in
such a way that the converted voice sounds as if it was uttered by
another speaker. The meaning and content of the speech are not
changed. Nowadays, many applications for the VCtask exist. An
important application is a customised text-to-speech (TTS-) system
which gives the ability to build corporate identities quickly and
inexpensively by modifying the underlying speech corpus of the
TTS-system and thus the sound of the voice. VC can also be used to
create special characters’ voices for the movie industry or to
“keep” the voice of an actor in different languages. The latter
case aims to retain the speaker’s identity in speech-to-speech
translation-scenarios.
However, many VC-systems suffer from a
poor naturalness and quality of the transformed voice. The transformed
voice can only sound naturally, if it includes all characteristics
relevant for the true target speaker. Within VC-systems, a main
problem is the mapping of the prosody which is one of the essential
features.
This talk reviews the basic concept of voice conversion
from its beginnings in 1984 till today. It points out the advantages
and disadvantages of the different approaches and shows improvements
by modelling the prosody. Two approaches to model the prosody are
presented and discussed with respect to the voice-conversion
task.
Short Biography
Jan Schwarz is working as a
research assistant (Wissenschaftlicher Mitarbeiter) at the Institute
for Circuit and System Theory (LNS) at the
Christian-Albrechts-University of Kiel, Germany, since September 2005.
He studied Electrical Engineering and Information Science and received
his diploma degree in July 2005 from the Ruhr-University Bochum,
Germany. At the LNS, Jan is writing his PhD (Dr.-Ing.) thesis in the
domain of digital speech signal processing, especially "voice
conversion". He develops a speech-synthesis system that uses
harmonic coders to convert the voice of one person into the voice of
another person. The aim is to modify the voice without changing the
meaning or content of the speech.
ium_ss09/parameter/de/minhilfe/
l=SXc3VgAAoGvmiq8jm6doPaFw8eIo4uH3SrnB39cTKDU%3D&as
k_name=GEORG%20ESSL&tipUrl=49216&L=