TransQuest commited on
Commit
799e174
·
1 Parent(s): 49fef6e

Create Readme

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en-zh
3
+ tags:
4
+ - Quality Estimation
5
+ - monotransquest
6
+ - hter
7
+ license: apache-2.0
8
+ ---
9
+
10
+
11
+ # TransQuest: Translation Quality Estimation with Cross-lingual Transformers
12
+ The goal of quality estimation (QE) is to evaluate the quality of a translation without having access to a reference translation. High-accuracy QE that can be easily deployed for a number of language pairs is the missing piece in many commercial translation workflows as they have numerous potential uses. They can be employed to select the best translation when several translation engines are available or can inform the end user about the reliability of automatically translated content. In addition, QE systems can be used to decide whether a translation can be published as it is in a given context, or whether it requires human post-editing before publishing or translation from scratch by a human. The quality estimation can be done at different levels: document level, sentence level and word level.
13
+
14
+ With TransQuest, we have opensourced our research in translation quality estimation which also won the sentence-level direct assessment quality estimation shared task in [WMT 2020](http://www.statmt.org/wmt20/quality-estimation-task.html). TransQuest outperforms current open-source quality estimation frameworks such as [OpenKiwi](https://github.com/Unbabel/OpenKiwi) and [DeepQuest](https://github.com/sheffieldnlp/deepQuest).
15
+
16
+
17
+ ## Features
18
+ - Sentence-level translation quality estimation on both aspects: predicting post editing efforts and direct assessment.
19
+ - Word-level translation quality estimation capable of predicting quality of source words, target words and target gaps.
20
+ - Outperform current state-of-the-art quality estimation methods like DeepQuest and OpenKiwi in all the languages experimented.
21
+ - Pre-trained quality estimation models for fifteen language pairs are available in [HuggingFace.](https://huggingface.co/TransQuest)
22
+
23
+ ## Installation
24
+ ### From pip
25
+
26
+ ```bash
27
+ pip install transquest
28
+ ```
29
+
30
+ ### From Source
31
+
32
+ ```bash
33
+ git clone https://github.com/TharinduDR/TransQuest.git
34
+ cd TransQuest
35
+ pip install -r requirements.txt
36
+ ```
37
+
38
+ ## Using Pre-trained Models
39
+
40
+ ```python
41
+ import torch
42
+ from transquest.algo.sentence_level.monotransquest.run_model import MonoTransQuestModel
43
+
44
+
45
+ model = MonoTransQuestModel("xlmroberta", "TransQuest/monotransquest-hter-en_zh-wiki", num_labels=1, use_cuda=torch.cuda.is_available())
46
+ predictions, raw_outputs = model.predict([["Reducerea acestor conflicte este importantă pentru conservare.", "Reducing these conflicts is not important for preservation."]])
47
+ print(predictions)
48
+ ```
49
+
50
+
51
+ ## Documentation
52
+ For more details follow the documentation.
53
+
54
+ ## Table of Contents
55
+ 1. **[Installation](https://tharindudr.github.io/TransQuest/install/)** - Install TransQuest locally using pip.
56
+ 2. **Architectures** - Checkout the architectures implemented in TransQuest
57
+ 1. [Sentence-level Architectures](https://tharindudr.github.io/TransQuest/architectures/sentence_level_architectures/) - We have released two architectures; MonoTransQuest and SiameseTransQuest to perform sentence level quality estimation.
58
+ 2. [Word-level Architecture](https://tharindudr.github.io/TransQuest/architectures/word_level_architecture/) - We have released MicroTransQuest to perform word level quality estimation.
59
+ 3. **Examples** - We have provided several examples on how to use TransQuest in recent WMT quality estimation shared tasks.
60
+ 1. [Sentence-level Examples](https://tharindudr.github.io/TransQuest/examples/sentence_level_examples/)
61
+ 2. [Word-level Examples](https://tharindudr.github.io/TransQuest/examples/word_level_examples/)
62
+ 4. **Pre-trained Models** - We have provided pretrained quality estimation models for fifteen language pairs covering both sentence-level and word-level
63
+ 1. [Sentence-level Models](https://tharindudr.github.io/TransQuest/models/sentence_level_pretrained/)
64
+ 2. [Word-level Models](https://tharindudr.github.io/TransQuest/models/word_level_pretrained/)
65
+ 5. **[Contact](https://tharindudr.github.io/TransQuest/contact/)** - Contact us for any issues with TransQuest
66
+
67
+
68
+ ## Citations
69
+ If you are using the word-level architecture, please consider citing this paper which is accepted to [ACL 2021](https://2021.aclweb.org/).
70
+
71
+ ```bash
72
+ @InProceedings{ranasinghe2021,
73
+ author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
74
+ title = {An Exploratory Analysis of Multilingual Word Level Quality Estimation with Cross-Lingual Transformers},
75
+ booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
76
+ year = {2021}
77
+ }
78
+ ```
79
+
80
+ If you are using the sentence-level architectures, please consider citing these papers which were presented in [COLING 2020](https://coling2020.org/) and in [WMT 2020](http://www.statmt.org/wmt20/) at EMNLP 2020.
81
+
82
+ ```bash
83
+ @InProceedings{transquest:2020a,
84
+ author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
85
+ title = {TransQuest: Translation Quality Estimation with Cross-lingual Transformers},
86
+ booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
87
+ year = {2020}
88
+ }
89
+ ```
90
+
91
+ ```bash
92
+ @InProceedings{transquest:2020b,
93
+ author = {Ranasinghe, Tharindu and Orasan, Constantin and Mitkov, Ruslan},
94
+ title = {TransQuest at WMT2020: Sentence-Level Direct Assessment},
95
+ booktitle = {Proceedings of the Fifth Conference on Machine Translation},
96
+ year = {2020}
97
+ }
98
+ ```