Inhalt des Dokuments
- © Q&U
- Speech Quality Assessment in Crowdsourcing
Next Generation Crowdsourcing 
Rafael Zequeira Jiménez received a degree as Telecommunication
Engineer (equivalent to Master of Science) from the University of
Granada, Spain in 2014.
From 2013 to 2014 he studied at Technische Universität Berlin
within the Erasmus program. At this time, he worked on his Master
Thesis entitled: “Secure multi protocol system based on a Resource
Model for the IoT and M2M services”. In December 2013, Rafael joined
the SNET department of the Deutsche Telekom Innovation Laboratories
(T-Labs), where he worked during 10 months as a student research
assistant in the TRESOR project. In which he focused on designing and
implementing REST APIs to communicate different components.
In June 2015 Rafael joined the Quality and Usability Lab department lead by Prof. Dr.-Ing. Sebastian Möller, to work as Research Assistant in the “Next Generation Crowdsourcing” group, specifically in the Crowdee project. Since 2016, he works towards his PhD in the topic: “Analysis of Crowdsourcing Micro-Tasks for Speech Quality Assessment”.
Quality and Usability Lab
D-10587 Berlin, Germany
|Autor||Zequeira Jiménez, Rafael and Llagostera, Anna and Naderi, Babak and Möller, Sebastian and Berger, Jens|
|Buchtitel||Companion Proceedings of The 2019 World Wide Web Conference|
|Adresse||New York, NY, USA|
|Zusammenfassung||Crowdsourcing is a great tool for conducting subjective user studies with large amounts of users. Collecting reliable annotations about the quality of speech stimuli is challenging. The task itself is of high subjectivity and users in crowdsourcing work without supervision. This work investigates the intra- and inter-listener agreement withing a subjective speech quality assessment task. To this end, a study has been conducted in the laboratory and in crowdsourcing in which listeners were requested to rate speech stimuli with respect to their overall quality. Ratings were collected on a 5-point scale in accordance with the ITU-T Rec. P.800 and P.808, respectively. The speech samples were taken from the database ITU-T Rec. P.501 Annex D, and were presented four times to the listeners. Finally, the crowdsourcing results were contrasted to the ratings collected in the laboratory. Strong and significant Spearman's correlation was achieved when contrasting the ratings collected in both environments. Our analysis show that while the inter-rater agreement increased the more the listeners conducted the assessment task, the intra-rater reliability remained constant. Our study setup helped to overcome the subjectivity of the task and we found that disagreement can represent a source of information to some extent.|