QE4PE: Word-level Quality Estimation for Human Post-Editing
Abstract
Word-level quality estimation (QE) detects erroneous spans in machine translations, which can direct and facilitate human post-editing. While the accuracy of word-level QE systems has been assessed extensively, their usability and downstream influence on the speed, quality and editing choices of human post-editing remain understudied. Our QE4PE study investigates the impact of word-level QE on machine translation (MT) post-editing in a realistic setting involving 42 professional post-editors across two translation directions. We compare four error-span highlight modalities, including supervised and uncertainty-based word-level QE methods, for identifying potential errors in the outputs of a state-of-the-art neural MT model. Post-editing effort and productivity are estimated by behavioral logs, while quality improvements are assessed by word- and segment-level human annotation. We find that domain, language and editors' speed are critical factors in determining highlights' effectiveness, with modest differences between human-made and automated QE highlights underlining a gap between accuracy and usability in professional workflows.
Community
🐮 GroTE Interface: https://github.com/gsarti/grote
🤗 Dataset: https://huggingface.co/datasets/gsarti/qe4pe
🖥️ Code: https://github.com/gsarti/qe4pe
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Efficient Machine Translation Corpus Generation: Integrating Human-in-the-Loop Post-Editing with Large Language Models (2025)
- A comparison of translation performance between DeepL and Supertext (2025)
- Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation (2025)
- Enhancing Human Evaluation in Machine Translation with Comparative Judgment (2025)
- Quality-Aware Decoding: Unifying Quality Estimation and Decoding (2025)
- Automatic Input Rewriting Improves Translation with Large Language Models (2025)
- When LLMs Struggle: Reference-less Translation Evaluation for Low-resource Languages (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper