Inhalt des Dokuments
Nautilus Speaker Characterization (NSC) Corpus
- Laura Fernández Gallardo in the position of a speaker in the acoustically-isolated Nautilus room
- © Copyright??
- Diagram of device connections between room Nautilus and room Belafonte used for the speech recordings
- © Copyright??
The Nautilus Speaker Characterization (NSC) Corpus comprises clean microphone recordings of conversational speech from 300 German speakers (126 males and 174 females) aged 18 to 35 years, with no marked dialect/accent. The recordings were performed in the acoustically-isolated room "Nautilus" (which gives name to this database) of the Quality and Usability Lab of the Technische Universität Berlin, Germany, in 2016/2017.
Four scripted and four semi-spontaneous dialogs were elicited from the speakers, simulating telephone call inquiries. Additionally, spontaneous neutral and emotional (predominantly excitement or frustration) speech utterances and questions were produced. Interactions between speakers and their interlocutor (a recording assistant) are provided in separate mono files, accompanied by timestamps and tags that define the speaker's turns. All speech is sampled at 48 kHz (audio/wav 16-bit 1-channel files). The microphone AKG C 414B-XLS was employed to record the speakers (95.6 hours of speech) and the headset Sennheiser HMD 46 to record the interlocutor (59.5 hours of speech).
The speech corresponding to one of the semi-spontaneous dialogs was later evaluated with respect to 34 continuous numeric labels of perceived interpersonal speaker characteristics (such as likable, attractive, competent, childish, etc.), for the 300 speakers by 15 different listeners, on average. For a set of 20 selected "extreme" speakers, also 34 naive voice descriptions (such as bright, creaky, articulate, melodious, etc.) were evaluated by 26 external raters.
All labels are provided, together with the speech recordings and the speakers' metadata (age, gender, place of birth, chronological places of residence and duration of stay, place of birth of the mother and of the father, self-assessed personality, etc.). The NSC data is publicly released under license agreement for exclusive use in non-commercial scientific research and teaching.
The material provided in the NSC Corpus is expected to be of broad interest to phoneticians and speech scientists working on the perceptual and acoustic correlates of personal attributes. Speech and prosody production and conversational behavior in human-human interactions can be studied by analyzing speaker's and interlocutor's turns of spontaneous speech. The NSC data may also be adequate for other speech-related research requiring high-quality clean recordings in German.
The speech files, associated labels, and speakers' metadata have been made available to the scientific community for non-commercial research and teaching purposes only.
The full database can be downloaded from the CLARIN repository .
This resource also appears in the ELRA catalog .
This database is subject to the CLARIN ACA+BY+NC+NORED license  (Freely available for academia)
By using the corpus resources you agree:
- to use the NSC corpus for non-commercial research and teaching purposes only
- not to redistribute the NSC corpus or parts of it to third parties
- to cite the following source in any published work which is based on the corpus:
Fernández Gallardo, L. and Weiss, B., "The Nautilus Speaker Characterization Corpus: Speech Recordings and Labels of Speaker Characteristics and Voice Descriptions," in International Conference on Language Resources and Evaluation (LREC), 2018.
The speech material, recording procedure, and data labels are described in this document  and summarized in these slides .
You can listen to some samples here .
You can follow my work in Github  on statistical data analysis of subjective perceptions and on machine learning for predictive modeling using this corpus.
For more details, see the project's publications (below).
|Number of speakers; gender||300
speakers; 126 males, 174 females|
|Speakers' age||18 to 35 years
language||German (mother tongue)|
|Speech type||conversational; scripted,
semi-spontaneous, and spontaneous|
|Speech duration; size||50 GB, 155 hours of speech|
|File format||1-channel audio/wav, 48
kHz, 16 bit|
speakers: microphone AKG C 414B-XLS
interlocutor: headset Sennheiser HMD 46
|Metadata||demographic information and self-assessed
|Labels||34 interpersonal speaker
attributions (continuous numeric)|
34 voice descriptions for selected speakers (continuous numeric)
|Author/data owner||Laura Fernández
- Fernández Gallardo, L., "Effects of Transmitted Speech Bandwidth on Subjective Assessments of Speaker Characteristics," Int. Conf. on Quality of Multimedia Experience (QoMEX), 2018.
- Fernández Gallardo, L., Mittag, G., Möller, S. and Beerends, J., "Variable Voice Likability Affecting Subjective Speech Quality Assessments," Int. Conf. on Quality of Multimedia Experience (QoMEX), 2018.
- Fernández Gallardo, L. and Weiss, B., "The Nautilus Speaker Characterization Corpus: Speech Recordings and Labels of Speaker Characteristics and Voice Descriptions," in International Conference on Language Resources and Evaluation (LREC), 2018.
- Fernández Gallardo, L. and Weiss, B., "Perceived Interpersonal Speaker Attributes and their Acoustic Features," in 13. Tagung Phonetik und Phonologie im deutschprachigen Raum, 2017. [paper ][slides ]
- Fernández Gallardo, L. and Weiss, B., "Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution," in Interspeech, pp. 904-908, 2017. [paper ][poster ]
- Fernández Gallardo, L., Zequeira Jiménez, R. and Möller, S., "Perceptual Ratings of Voice Likability Collected through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing," in Interspeech, pp. 2233-2237, 2017. [paper ][poster ]
- Zequeira Jiménez, R., Fernández Gallardo, L. and Möller, S., "Scoring Voice Likability using Pair-Comparison: Laboratory vs. Crowdsourcing Approach," International Young Researcher Summit on Quality of Experience in Emerging Multimedia Services (QEEMS), 2017.
- Zequeira Jiménez, R., Fernández Gallardo, L. and Möller, S., "Scoring Voice Likability using Pair-Comparison: Laboratory vs. Crowdsourcing Approach," Int. Conf. on Quality of Multimedia Experience (QoMEX), 2017.
- Fernández Gallardo, L. "Recording a High-Quality German Speech Database for the Study of Speaker Personality and Likability," 12. Tagung Phonetik und Phonologie im deutschprachigen Raum, pp. 43-36, 2016. [paper ][slides ]
- Fernández Gallardo, L. "A Paired-Comparison Listening Test for Collecting Voice Likability Scores," Informationstechnische Gesellschaft im VDE (ITG) Conference on Speech Communication, pp. 185-189, 2016. [slides ]
- Fernández Gallardo, L. and Weiss, B., "Speech Likability and Personality-based Social Relations: A Round-Robin Analysis over Communication Channels," Interspeech, pp. 903-907, 2016. [paper ][slides ]