Papers
arxiv:2010.03662

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

Published on Oct 7, 2020
Authors:
,

Abstract

Detecting fine-grained differences in content conveyed in different languages matters for cross-lingual NLP and multilingual corpora analysis, but it is a challenging machine learning problem since annotation is expensive and hard to scale. This work improves the prediction and annotation of fine-grained semantic divergences. We introduce a <PRE_TAG>training strategy</POST_TAG> for multilingual BERT models by <PRE_TAG>learning to rank</POST_TAG> <PRE_TAG>synthetic divergent examples</POST_TAG> of varying granularity. We evaluate our models on the <PRE_TAG>Rationalized English-French Semantic Divergences</POST_TAG>, a new dataset released with this work, consisting of English-French sentence-pairs annotated with semantic divergence classes and token-level rationales. Learning to rank helps detect fine-grained sentence-level divergences more accurately than a strong sentence-level similarity model, while token-level predictions have the potential of further distinguishing between coarse and fine-grained divergences.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2010.03662 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2010.03662 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2010.03662 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.