Comparison and adaptation of automatic evaluation metrics for quality assessment of re-speaking

Re-speaking is a mechanism for obtaining high-quality subtitles for use in live broadcasts and other public events. Because it relies on humans to perform the actual re-speaking, the task of estimating the quality of the results is non- trivial. Most organizations rely on human effort to perform the actual quality assessment, but purely automatic methods have been developed for other similar problems (like Machine Translation). This paper will try to compare several of these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER, and RIBES. These will then be matched to the human-derived NER metric, commonly used in re-speaking. The purpose of this paper is to assess whether the above automatic metrics normally used for MT system evaluation can be used in lieu of the manual NER metric to evaluate re-speaking transcripts.

Access rights

Access: otwarty dostęp

Rights: CC BY 4.0

Attribution 4.0 International (CC BY 4.0)

URI

https://repo.agh.edu.pl/handle/AGH/113178

Collections

Artykuły (CN-csci)

Full item page