direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Reviewed Conference Papers

go back to overview

Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation
Zitatschlüssel iskender2020best
Autor Iskender, Neslihan and Polzehl, Tim and Möller, Sebastian
Buchtitel Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems
Seiten 164–175
Jahr 2020
Ort online
Adresse online
Monat nov
Verlag Association for Computational Linguistics (ACL)
Serie EMNLP | Eval4NLP
Wie herausgegeben Fullpaper
Zusammenfassung One of the main challenges in the development of summarization tools is summarization quality evaluation. On the one hand, the human assessment of summarization quality conducted by linguistic experts is slow, expensive, and still not a standardized procedure. On the other hand, the automatic assessment metrics are reported not to correlate high enough with human quality ratings. As a solution, we propose crowdsourcing as a fast, scalable, and cost-effective alternative to expert evaluations to assess the intrinsic and extrinsic quality of summarization by comparing crowd ratings with expert ratings and automatic metrics such as ROUGE, BLEU, or BertScore on a German summarization data set. Our results provide a basis for best practices for crowd-based summarization evaluation regarding major influential factors such as the best annotation aggregation method, the influence of readability and reading effort on summarization evaluation, and the optimal number of crowd workers to achieve comparable results to experts, especially when determining factors such as overall quality, grammaticality, referential clarity, focus, structure & coherence, summary usefulness, and summary informativeness.
Link zur Publikation Link zur Originalpublikation Download Bibtex Eintrag

go back to overview

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe