Inhalt des Dokuments
Quality of Synthetic Speech - Perpetual Dimensions, Influencing Factors and Instrumental Assesment
Nonetheless, modern TTS systems still suffer from diverse quality con- straints: frequent concatenations and temporal manipulations in diphone syn- thesis cause discontinuous speech, HMM synthesis can lead to natural sound- ing but also very buzzy and muffled speech, and the quality of unit selection voices not only depends on the degree of the fit, but also on the appropriate- ness of the available speech units. Therefore, the resulting impairments all yield different perceptual impressions. Thus, the quality of synthetic speech is of multidimensional nature.
Therefore, research towards perceptual quality dimensions of synthetic speech is reviewed and two experiments towards perceptual quality are con- ducted. Their findings are compared with the state of the art and a set of five perceptual quality dimensions is derived. They are: (i) naturalness of voice, (ii) prosodic quality, (iii) fluency and intelligibility, (iv) absence of disturbances, and (v) calmness. Moreover, a test protocol is designed that recommends an experimental setup to assess these five dimensions.
In addition, several factors that influence these dimensions are analyzed. First, the findings of two studies show that the relevance of these dimensions shift depending on the use case (short messages readers vs. synthesized au- diobooks). Second, a significant effect of a speaker’s voice of a speech corpus is verified for all dimensions. And third, it is shown that the size of the speech corpus for unit selection voices significantly affects all dimensions.
Furthermore, different approaches towards instrumental quality assess- ment of synthetic speech are examined. Two linear regression models are developed and employed to estimate the quality of TTS signals. Even though they reach correlations between estimated score and auditory rating of up to .74, they are outperformed by two more complex, non-linear approaches. One of these non-linear measures is utilized with the aim to improve the quality of MaryTTS unit selection voices. Even though this goal could not be achieved, the study highlights different approaches to further improve the prediction accuracy and therefore also the quality of the generated voice.
Multi-episodic perceived quality of telecommunication services
Telecommunication services have to cope with degradations resulting from the necessary transmission of data. A telecommunication service might thus not always be able to provide the same performance to a user. The resulting variation in perceived quality might affect the user’s satisfaction, attitude, behavior, and also future-use intention towards a telecommunication service. This thesis investigates the formation process of perceived quality across multiple, distinct interactions with one telecommunication service. The formation process of the so-called multi-episodic perceived quality is examined for two different time spans. Here, repeated-use in one session consisting of multiple usage episodes is investigated with an overall duration of up to 45 min. This is complemented by studying the formation process spanning several days. This investigation was conducted by performing empirical experiments under controlled laboratory settings as well as field experiments. These experiments are based upon the Mean Opinion Score (MOS), i. e., the assessment of the perceived quality of an (almost) identical stimulus/condition by multiple observers to derive the judgment of an average observer. The impact of individual user behavior was limited here by defining the task, content, and also time for each usage episode as well as the provided performance (defined-use method). The empirical data shows that applying the defined-use method is feasible and yields consistent results. The results of the experiments show that more recent episodes have a higher impact on the multi-episodic perceived quality (recency effect). A saturation is observed for consecutive degraded episodes, i. e., the multi-episodic judgments remain on the same level above the episodic judgments of degraded episodes. In addition, a duration neglect is observed, i. e., a longer degraded episode does not have a higher negative impact on judgments of multi-episodic perceived quality. With the empirical data, models for the prediction of multi-episodic judgments are evaluated. These models are based on the weighted average of the episodic judgments. The evaluation showed that a linear function outperforms a window function in regard to prediction accuracy and robustness.
Download this book @ https://depositonce.tu-berlin.de/handle/11303/5906
Modellierung der Nutzerwarnehmung und des Nutzerverhaltens bei der Interaktion mit einem mobilen Bezahlsystem
Sicherheitsrelevante elektronische Systeme wie Computer, Smartphones, elektronische Zugangssysteme und eBanking betreffen heute unser ganzes Leben. Deshalb werden diese Systeme heute nicht nur von professionellen, sondern auch gewöhnlichen, Anwendern benutzt und beide müssen sie korrekt benutzen, um ihre eigene Sicherheit nicht zu gefährden. Daraus folgt, dass diese Systeme nicht nur sicher, sondern auch benutzbar sein müssen, und das Systemdesign die Herausforderung erfüllen muss, eine Balance zwischen Sicherheit und Benutzbarkeit zu finden. In der Forschung zur Benutzbarkeit von Sicherheitssystemen (usable security) herrscht Einigkeit, dass beide Eigenschaften meist entgegengesetzt wirken: mehr Sicherheit führt oft zu weniger Benutzbarkeit und umgekehrt. Deshalb ist es wichtig zu verstehen, welche Faktoren die Nutzerwahrnehmung in Bezug auf Sicherheit und Benutzbarkeit und welche Faktoren die Entscheidung, ein solches System zu benutzen, beeinflussen. Mit diesem Wissen können sicherere und gleichzeitig besser benutzbare Systeme entwickelt werden. Die Formalisierung dieses Wissens in Form von Modellen, die während des Entwicklungsprozesses benutzt werden (idealerweise unter Beachtung des Usability Engineering Lifecycles), kann helfen, Kosten zu senken und bessere (also sichere und benutzbare) Produkte zu entwickeln. Mit dieser Arbeit liefert der Autor eine Grundlage, um die Nutzer von sicherheitsrelevanten elektronischen Systemen zu verstehen. Als erstes wurde eine theoretische Analyse des Nutzers und möglicher Einflussfaktoren auf seine Wahrnehmung von und sein Nutzungsverhalten im Umgang mit diesen Systemen durchgeführt. Das gesammelte Wissen floss in eine Taxonomie dieser Faktoren und von dieser wurde ein theoretisches Modell abgeleitet. Um das theoretische Modell mit Daten zu füllen, wurde eine Evaluationsmethode, bestehend aus einem Fragebogen und einem Experimentalaufbau, entwickelt. Damit wurden fünf Experimente mit insgesamt 88 Versuchsteilnehmern durchgeführt. Nach jedem Experiment wurden der Fragebogen und der Experimentalaufbau überprüft und gegebenenfalls angepasst. Mit den Daten aus den Experimenten wurden statistische Modelle berechnet, um signifikant einflussreiche Faktoren zu identifizieren. Des Weiteren wurde das theoretische Modell während des Modellierungsprozesses auf Grundlage der Resultate angepasst. Zuletzt wurde die Evaluationsmethode überarbeitet, um die Ergebnisse der Modellierung und die gefunden Einflussfaktoren widerzuspiegeln. Zusammenfassend steuert diese Arbeit eine Taxonomie potentieller Einflussfaktoren auf Nutzerwahrnehmung und -verhalten, eine empirische Evaluationsmethode, bestehend aus einem Fragebogen und einem Experimentalaufbau, zwei Modelle der Nutzerwahrnehmung und ein Modell des Nutzerverhaltens zum Forschungsfeld usable security bei. Die in dieser Arbeit identifizierten signifikanten Einflussfaktoren sind zum einen Persönlichkeitsmerkmale (Offenheit, Gewissenhaftigkeit, positive Einstellung) und den Dienst betreffende Eigenschaften (wahrgenommene Gebrauchstauglichkeit, wahrgenommene Sicherheit, kategorialer Preis, Dienstanbieter und wahrgenommene Größe eines finanziellen Schadens).
Diese Dissertation ist auf Deutsch verfasst.
Lade das Buch hier runter: https://depositonce.tu-berlin.de/handle/11303/5874
Modeling modality selection in multimodal human-computer interaction (Stefan Schaffer)
Initially, foundations of decision-making in multimodal HCI are discussed, and the state of the art in automatic usability evaluation (AUE) is described. It is shown that there are currently no uniform empirical results on factors influencing modality choice that allow for the creation of a computational model. As part of this work two AUE tools, the MeMo workbench and CogTool, are extended by a newly created computational model for the simulation of multimodal HCI.
Aiming at answering the first research question, the empirical part of the thesis describes three experiments with a mobile application integrating touch screen and speech input. In summary the results indicate that modality efficiency and input performance are important moderators of modality choice.
The second research question is answered by the derivation of a utility-driven model for input modality choice in multimodal HCI based on the empirical data. The model provides probability estimations of modality usage, based on different levels of the parameters modality efficiency and input performance. Four variants of the model that differ in training data are tested. The analysis reveals a considerable fit for models based on averaged modality usage data.
Answering the third research question it is illustrated how the modality choice model can be deployed within AUE tools for simulating multimodal interaction. The multimodal extension as well as the practical utilization of MeMo is depicted, and it is described how unimodal CogTool models of touch screen and speech based interaction can be rendered into multimodal models. A comparison of data generated by simulations with the AUE tools with predictions of the derived modality selection algorithm verifies the correct integration of the model into the tools. The practical application discloses the usefulness of the modality choice model for the prediction of the number of steps and the total time spent to solve specific tasks with multimodal systems. The practical part is concluded by a comparison of Memo and CogTool. Both tools are classified, and an assessment on a subjective basis as well as on the the basis of the quality of predictions is conducted.
Human and Automatic Speaker Recognition over Telecommunication Channels (Laura Fernández Gallardo)
This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, together with an insight into how speaker-specific characteristics of speech are preserved through different transmissions. It provides sufficient motivation for considering speaker recognition as a criterion for the migration from narrowband to enhanced bandwidths, such as wideband and super-wideband.
Buy this book @ http://www.springer.com/de/book/9789812877260
Simulation des Interaktionsverhaltens von Senioren bei der Benutzung von mobilen Endgeräten(Matthias Schulz)
Das Ziel dieser Arbeit ist es, eine bestehende Software zur Automatischen Usability Evaluierung (AUE) dahingehend zu erweitern, dass demografische Verteilungen über Einschränkungen in der älteren Bevölkerung in den Simulationsprozess einfließen und Auswirkungen von fehlenden oder unpassenden mentalen Modellen des Eingabegerätes simuliert werden können. Die vorliegende Arbeit dokumentiert den aktuellen Stand der Forschung, die durchgeführten empirischen Versuche zur Modellbildung bzw. Validierung der Modelle und die Weiterentwicklung der Software. Wie in dieser Arbeit gezeigt wird, haben ältere Menschen verschiedene Einschränkungen (Wahrnehmung, Kognition und Motorik), die die Interaktion mit einem System negativ beeinflussen können. Es existieren verschiedene Methoden und Werkzeuge, die es ermöglichen Usability-Fehler durch Simulation zu finden, allerdings haben die Methoden und Werkzeuge offene Probleme, die die Anwendbarkeit limitieren. So wird nicht betrachtet, ob die Personen mit dem Eingabegerät zurechtkommen. Weiterhin werden nur wenige oder keine demografischen Daten verwendet, die eine Abschätzung der Schwere eines Usability-Problems ermöglichen könnten. Um die demografischen Verteilungen von Einschränkungen in die AUE integrieren zu können, wird eine technische Lösung beschrieben, die es ermöglicht Nutzermodelle zu generieren, die realistische Einschränkungen haben. Durch die Reproduktion der Verteilungen von Einschränkungen in der Gesellschaft ist es möglich die Schwere eines Usability-Problems abzuschätzen. Zur Integration von unpassenden und fehlenden mentalen Modellen von Eingabegeräten in die Simulation wurden zwei Versuche durchgeführt. Der erste Versuch diente der Modellbildung und der zweite Versuch der Validierung der Modelle durch einen unabhängigen Test. Die Ergebnisse beide Versuche werden in dieser Arbeit dargestellt. Durch die korrekte Reproduktion von Einschränkungen in der Gesellschaft und durch die beispielhafte Einbindung von mentalen Modellen bezüglich von Eingabegeräten, kann die Qualität von Simulationen substanziell gesteigert werden. Die in dieser Arbeit vorgestellten Ergebnisse erweitern den aktuellen Stand der Forschung und ermöglichen eine vielseitige Weiternutzung.
Diese Dissertation ist auf Deutsch verfasst.
Lade das Buch hier runter: https://depositonce.tu-berlin.de/handle/11303/5303
Perceived security and usage of a mobile payment application (Hanul Sieger)
This work presents a comprehensive experimental study of users’ perceived security and frequency of use of a smartphone-based mobile payment app. Previous research work relied primarily on surveys to collect data of potential users’ expectations of and attitude towards mobile payment. It remained sketchy how users would react using a real app, and how mobile payment would be used in comparison with existing payment methods such as cash and payment cards. The design of the experiments was based on a generalizable taxonomy built for the purpose of use within the field of usable security. Five sets of experiments were conducted in which participants took part in a shopping experience. The security method used with the mobile payment app was varied (no security, PIN, fingerprint recognition) and "attacks" were simulated on the payment methods. Furthermore, important personality traits were determined by established questionnaires. The results of the experiments revealed relevant factors, which show an impact on perceived security and usage. It could be shown to what extent personality traits and other external factors are related to perceived security and usage. Additionally, perceived security and usage were modeled and two classificators were developed to distinguish between convenient and inconvenient security methods (for usage), and apps with and without security methods (for perceived security). Finally, the models were incorporated as a proof-of-concept into the tool MeMo for the simulation of user behavior. Overall, it was shown that personality traits have a moderate effect on perceived security and usage. Price, shopping environment, and attacks are also influential. Different security methods showed significant differences in evaluation and use of the app.
Download this book @ https://opus4.kobv.de/opus4-tuberlin/frontdoor/index/index/docId/7395
Neural Correlates of Quality During Perception of Audiovisual Stimuli (Sebastian Arndt)
This book presents a new approach to examining the perceived quality of audiovisual sequences. It uses electroencephalography (EEG) to explain in detail how user quality judgments are formed within a test participant, and what the physiological implications might be when subjects are exposed to lower quality media. The book redefines the experimental paradigms of using EEG in the area of quality assessment so that they better suit the requirements of standard subjective quality testing, and presents experimental protocols and stimuli that have been adjusted accordingly.
Buy this book @ www.springer.com/us/book/9789811002472
Emotional Feedback for Mobile Devices (Julia Seebode)
This book investigates the functional adequacy as well as the affective impression made by feedback messages on mobile devices. It presents an easily adoptable experimental setup to examine context effects on various feedback messages and applies it to auditory, tactile and auditory-tactile feedback messages. This approach provides insights into the relationship between the affective impression and functional applicability of these messages as well as an understanding of the influence of unimodal components on the perception of multimodal feedback messages. The developed paradigm can also be extended to investigate other aspects of context and used to investigate feedback messages in modalities other than those presented. The book uses questionnaires implemented on a Smartphone, which can easily be adopted for field studies to broaden the scope even wider. Finally, the book offers guidelines for the design of system feedback.
Buy this book @ http://www.springer.com/us/book/9783319171920
Neural Correlates of Quality Perception for Complex Speech Signals (Jan-Niklas Antons)
This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand and of methods used in fields of research concerned with speech quality perception on the other.
Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.
Buy this book @ http://www.springer.com/us/book/9783319155203
Grasp Interaction With Tablets (Katrin Wolf)
This book presents guidelines for a future device type: a tablet that allows ergonomic front- and back-of-device interaction. These guidelines help designers and developers of user interfaces to build ergonomic applications for tablet devices, in particular for devices that enable back-of-device interaction. In addition, manufacturers of tablet devices obtain arguments that back-of-device interaction is a promising extension of the interaction design space and results in increased input capabilities, enriched design possibilities, and proven usability. The guidelines are derived from empirical studies and developed to fit the users’ skills to the way the novel device type is held. Three particular research areas that are relevant to develop design guidelines for tablet interaction are investigated: ergonomic gestures, interaction areas, and pointing techniques.
Buy this book @ http://www.springer.com/us/book/9783319139807
Assessment and prediction of audiovisual quality for video telephony (Benjamin Belmudez)
The work presented in this book focuses on modeling audiovisual quality as perceived by the users of IP-based solutions for video communication like video-telephony. It also extends the current framework for the parametric prediction of audiovisual call quality. The book addresses several aspects related to the quality perception of entire video calls, namely, the quality estimation of the single audio and video modalities in an interactive context, the audiovisual quality integration of these modalities and the temporal pooling of short sample-based quality scores to account for the perceptual quality impact of time-varying degradations.
Buy this book @ http://www.springer.com/us/book/9783319141657
Adaptive Identification of Acoustic Multichannel Systems Using Sparse Representations (Karim Helwani)
This book treats the topic of extending the adaptive filtering theory in the context of massive multichannel systems by taking into account a priori knowledge of the underlying system or signal. The starting point is exploiting the sparseness in acoustic multichannel system in order to solve the non-uniqueness problem with an efficient algorithm for adaptive filtering that does not require any modification of the loudspeaker signals.
Buy this book @ http://www.springer.com/de/book/9783319089539
Personality in Speech-Assessment and Automatic Classification (Tim Polzehl)
This work combines interdisciplinary knowledge and experience from research fields of psychology, linguistics, audio-processing, machine learning, and computer science. The work systematically explores a novel research topic devoted to automated modeling of personality expression from speech. For this aim, it introduces a novel personality assessment questionnaire and presents the results of extensive labeling sessions to annotate the speech data with personality assessments. It provides estimates of the Big 5 personality traits, i.e. openness, conscientiousness, extroversion, agreeableness, and
neuroticism. Based on a database built on the questionnaire, the book presents models to tell apart different personality types or classes from speech automatically.
Buy this book @ http://www.springer.com/de/book/9783319095158
An Evaluation Framework for Multimodal Interaction. Determining Quality Aspects and Modality Choice (Ina Wechsung)
This book presents (1) an exhaustive and empirically validated taxonomy of quality aspects of multimodal interaction as well as respective measurement methods, (2) a validated questionnaire specifically tailored to the evaluation of multimodal systems and covering most of the taxonomy‘s quality aspects, (3) insights on how the quality perceptions of multimodal systems relate to the quality perceptions of its individual components, (4) a set of empirically tested factors which influence modality choice, and (5) models regarding the relationship of the perceived quality of a modality and the actual usage of a modality.
Buy this book @ http://www.springer.com/de/book/9783319038094
Management of Speech and Video Telephony Quality in Heterogeneous Wireless Networks (Blazej Lewcio)
This book shows how networking research and quality engineering can be combined to successfully manage the transmission quality when speech and video telephony is delivered in heterogeneous wireless networks. Nomadic use of services requires intelligent management of ongoing transmission, and to make the best of available resources many fundamental trade-offs must be considered. Network coverage versus throughput and reliability of a connection is one key aspect, efficiency versus robustness of signal compression is another. However, to successfully manage services, user-perceived Quality of Experience (QoE) in heterogeneous networks must be known, and the perception of quality changes must be understood. These issues are addressed in this book, in particular focusing on the perception of quality changes due to switching between diverse networks, speech and video codecs, and encoding bit rates during active calls.
Buy this book @ http://www.springer.com/us/book/9783319021010
Dimension-based Quality Modeling of Transmitted Speech (Marcel Wältermann)
In this book, speech transmission quality is modeled on the basis of perceptual dimensions. The author identifies those dimensions that are relevant for today's public-switched and packet-based telecommunication systems, regarding the complete transmission path from the mouth of the speaker to the ear of the listener. Both narrowband (300-3400 Hz) as well as wideband (50-7000 Hz) speech transmission is taken into account. A new analytical assessment method is presented that allows the dimensions to be rated by non-expert listeners in a direct way. Due to the efficiency of the test method, a relatively large number of stimuli can be assessed in auditory tests. The test method is applied in two auditory experiments. The book gives the evidence that this test method provides meaningful and reliable results. The resulting dimension scores together with respective overall quality ratings form the basis for a new parametric model for the quality estimation of transmitted speech based on the perceptual dimensions. In a two-step model approach, instrumental dimension models estimate dimension impairment factors in a first step. The resulting dimension estimates are combined by a Euclidean integration function in a second step in order to provide an estimate of the total impairment.
Buy this book @ http://www.springer.com/us/book/9783642350184
Designing Speech Output for In-car Infotainment Applications Based on a Cognitive Model of Attention Allocation (Julia Niemann)
Drivers tend to glance at the display of in-car infotainment systems despite the presence of speech output. The SEEV Model by Wickens et al. (2003) defines parameters influencing attention allocation towards events in dynamic environments. Analysing the SEEV Model provides insights on which of the parameters of the SEEV Model speech output has disadvantages compared to visual output. In two driving simulator experiments, it was tested whether increasing or decreasing the deducted parameters of the SEEV Model for speech output by means of improving the speech output in various respects actually decreases attention allocation to the display. It was shown that increasing the relevant information content (corresponding to the parameters expectancy and value) for speech as well as decreasing the time effort (which corresponds to the parameter effort) of speech compared to a baseline condition leads towards lower percentage dwell time to the display. Next, it was shown that a conscious motor action performed to request for speech output (corresponding to the parameter effort) decreases attention allocation towards the display in situations whereby the secondary task gets interrupted by a highly demanding driving task. Based on theses insights, design recommendations for speech output were deducted and implemented in a prototype with several infotainment applications. In another driving simulator experiment, it was observed that the design recommendations actually reduced attention allocation towards the display compared to a common speech output design of in-car infotainment systems. The design recommendations to reduce the time effort of speech output were again evaluated regarding their influence on the development of users’ mental models. Finally, it was tested whether increasing the hedonic quality of speech output also leads towards less time glancing at the display. On the one hand, the conducted experiments showed which parameters of the SEEV Model could be influenced for speech output to decrease attention allocation to the display of a multimodal in-car infotainment system. On the other hand, the results provided insights regarding the applicability of specific SEEV Model parameters to the auditory modality since so far the model had only been evaluated with respect to the visual modality. Finally, it was shown that also hedonic aspects of speech output do indeed influence attention allocation.
Download this book @ https://opus4.kobv.de/opus4-tuberlin/frontdoor/index/index/docId/3796
Quantifying Quality Aspects of Multimodal Interactive Systems (Christine Kühnel)
This book systematically addresses the quantification of quality aspects of multimodal interactive systems. The conceptual structure is based on a schematic view on human-computer interaction where the user interacts with the system and perceives it via input and output interfaces. Thus, aspects of multimodal interaction are analyzed first, followed by a discussion of the evaluation of output and input and concluding with a view on the evaluation of a complete system.
Buy this book @ http://www.springer.com/de/book/9783642296017
Estimating Spoken Dialog Systeme Quality with User Models (Klaus-Peter Engelbrecht)
Spoken dialog systems have the potential to offer highly intuitive user interfaces, as they allow systems to be controlled using natural language. However, the complexity inherent in natural language dialogs means that careful testing of the system must be carried out from the very beginning of the design process. This book examines how user models can be used to support such early evaluations in two ways: by running simulations of dialogs, and by estimating the quality judgments of users. First, a design environment supporting the creation of dialog flows, the simulation of dialogs, and the analysis of the simulated data is proposed. How the quality of user simulations may be quantified with respect to their suitability for both formative and summative evaluation is then discussed. The remainder of the book is dedicated to the problem of predicting quality judgments of users based on interaction data. New modeling approaches are presented, which process the dialogs as sequences, and which allow knowledge about the judgment behavior of users to be incorporated into predictions. All proposed methods are validated with example evaluation studies.
Buy this book @ http://www.springer.com/gp/book/9783642315909
The Single-layer Potential Approach Applied on Sound Field Synthesis and its Extension to Non-enclosing Distributions of Secondary Sources (Jens Ahrens)
This chapter presents a number of analytic solutions to the problem of sound field synthesis in three and 2.5 dimensions, whereby continuous distributions of secondary sources are assumed. A focus lies on the explicit solution of the synthesis equation, which provides a perfect solution for enclosing secondary source distributions. The explicit solution is derived for spherical, circular, planar, and linear geometries. It is then shown that the well-known Near-field Compensated Higher Order Ambisonics approach is equivalent to the explicit solution for spherical secondary source distributions. The recently proposed Spectral Division Methods is identified as the extension of Near-field Compensated Higher Order Ambisonics to planar and linear secondary source distributions. Apart from the explicit solution, an implicit solution exists, which has become known as Wave Field Synthesis. The latter is derived from the Rayleigh Integral and its modern formulation for arbitrary complex secondary source distributions is outlined.
Buy this book @ http://link.springer.com/chapter/10.1007/978-3-642-25743-8_3
Integral and Diagnostic Intrusive Prediction of Speech Quality (Nicolas Côté)
This work deals with the instrumental measurement methods for the perceived quality of transmitted speech. These measures simulate the speech perception process employed by human subjects during auditory experiments. The measure standardized by the International Telecommunication Union (ITU), called “Wideband-Perceptual Speech Quality Evaluation (WB-PESQ)”, is not able to quantify all these perceived characteristics on a unidimensional quality scale, the Mean Opinion Score (MOS) scale. Recent experimental studies showed that subjects make use of several perceptual dimensions to judge about the quality of speech signals. In order to represent the signal at a higher stage of perception, a new model, called “Diagnostic Instrumental Assessment of Listening quality (DIAL)”, has been developed. It includes a perceptual and a cognitive model which simulate the whole quality judgment process. Except for strong discontinuities, DIAL predicts very well speech quality of different speech processing and transmission systems, and it outperforms the WB-PESQ.
Buy this book @ http://www.springer.com/us/book/9783642184628