Inhalt des Dokuments
Towards a Standardized Methodology for Evaluating the Quality of Speech Services using Crowdsourcing
The quality of transmitted speech signals is of main importance for telecommunication network providers, as it is one of the main indicators to evaluate their systems and services. Traditionally, subjective speech quality experiments have been conducted under controlled laboratory (Lab) conditions with professional audio equipment and following international standards like ITU-T Rec. P.800. Nowadays, crowdsourcing (CS) represents a fast and low-cost tool to carry out user-centered tests in the Internet and reaches a wider and diverse audience. However, the question remains whether the collected ratings using a CS platform are still valid and reliable, and how to design a CS experiment which provides similar results as a P.800 Lab test. Challenges arise of both technical and of conceptual nature that need to be addressed to ensure valid and reliable results, e.g. test design and presentation, workers' trustworthiness and task length are among them.
The aim of this project is to determine what influence has the environment background noise in which a CS study is conducted, and users’ hearing impairments, on the results of a speech quality assessment task.
To reach this goal successfully, it is necessary to answer a series of research questions, which have to be investigated as part of this research project:
- Participant influence factors: How do participants’ hearing characteristics influence the results in a speech quality assessment experiment in Crowdsourcing? How can these characteristics be controlled?
- Environment influence factors: In which environments can speech quality evaluation tasks be carried out in order to achieve reliable results? What is the influence of background noise on the results? How can the environment be detected and classified from microphone recordings of the Crowdsourcing?
- Crowdsourcing vs. Laboratory: How do quality evaluation results differ between Laboratory and Crowdsourcing environments? Which level of difference is still acceptable to not disprove Crowdsourcing-based evaluation? Do both approaches complement each other or is there redundancy?
To address each of these questions, application oriented scientific work will be performed. Our results will be scientifically published, and will contribute to the definition of new ITU-T Recommendations for the subjective evaluation of speech in CS (P.CROWD and others). The recommendations produced as a result of this project will be validated by experts of Study Group 12 of the ITU-T. We expect that the results will be particularly useful to reduce testing costs and to increase testing speed, to produce ground truth data for quality assessment and monitoring tools.
person:||Rafael Zequeira Jiménez