TU Berlin

Quality and Usability LabReviewed Conference Papers

Inhalt des Dokuments

zur Navigation

Reviewed Conference Papers

go back to overview

Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead
Zitatschlüssel iskender2021b
Autor Iskender, Neslihan and Polzehl, Tim and Möller, Sebastian
Buchtitel Proceedings of the Workshop on Human Evaluation of NLP Systems
Seiten 86–96
Jahr 2021
ISBN 978-1-954085-10-7
Ort online
Adresse online
Monat apr
Notiz online
Verlag Association for Computational Linguistics
Serie HumEval
Wie herausgegeben Fullpaper
Zusammenfassung Only a small portion of research papers with human evaluation for text summarization provide information about the participant demographics, task design, and experiment protocol. Additionally, many researchers use human evaluation as gold standard without questioning the reliability or investigating the factors that might affect the reliability of the human evaluation. As a result, there is a lack of best practices for reliable human summarization evaluation grounded by empirical evidence. To investigate human evaluation reliability, we conduct a series of human evaluation experiments, provide an overview of participant demographics, task design, experimental set-up and compare the results from different experiments. Based on our empirical analysis, we provide guidelines to ensure the reliability of expert and non-expert evaluations, and we determine the factors that might affect the reliability of the human evaluation.
Link zur Publikation Link zur Originalpublikation Download Bibtex Eintrag

go back to overview



Schnellnavigation zur Seite über Nummerneingabe