Files changed (1) hide show
  1. README.md +73 -10
README.md CHANGED
@@ -6,18 +6,60 @@ license: apache-2.0
6
 
7
  ### opus-mt-ru-en
8
 
9
- * source languages: ru
10
- * target languages: en
11
- * OPUS readme: [ru-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/ru-en/README.md)
12
-
13
- * dataset: opus
14
- * model: transformer-align
15
- * pre-processing: normalization + SentencePiece
16
- * download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip)
17
- * test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.test.txt)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  * test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.eval.txt)
19
 
20
- ## Benchmarks
21
 
22
  | testset | BLEU | chr-F |
23
  |-----------------------|-------|-------|
@@ -31,3 +73,24 @@ license: apache-2.0
31
  | newstest2019-ruen.ru.en | 31.4 | 0.576 |
32
  | Tatoeba.ru.en | 61.1 | 0.736 |
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ### opus-mt-ru-en
8
 
9
+ ## Table of Contents
10
+ - [Model Details](#model-details)
11
+ - [Uses](#uses)
12
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
13
+ - [Training](#training)
14
+ - [Evaluation](#evaluation)
15
+ - [Citation Information](#citation-information)
16
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
17
+
18
+ ## Model Details
19
+ **Model Description:**
20
+ - **Developed by:** Language Technology Research Group at the University of Helsinki
21
+ - **Model Type:** Transformer-align
22
+ - **Language(s):**
23
+ - Source Language: Russian
24
+ - Target Language: English
25
+ - **License:** Apache-2.0
26
+ - **Resources for more information:**
27
+ - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)
28
+
29
+
30
+
31
+ ## Uses
32
+
33
+ #### Direct Use
34
+
35
+ This model can be used for translation and text-to-text generation.
36
+
37
+
38
+ ## Risks, Limitations and Biases
39
+
40
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
41
+
42
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
43
+
44
+ Further details about the dataset for this model can be found in the OPUS readme: [ru-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/ru-en/README.md)
45
+
46
+ ## Training
47
+ #### Training Data
48
+ ##### Preprocessing
49
+ * Pre-processing: Normalization + SentencePiece
50
+ * Dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT)
51
+ * Download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip)
52
+
53
+ * Test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.test.txt)
54
+
55
+
56
+ ## Evaluation
57
+
58
+ #### Results
59
+
60
  * test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.eval.txt)
61
 
62
+ #### Benchmarks
63
 
64
  | testset | BLEU | chr-F |
65
  |-----------------------|-------|-------|
73
  | newstest2019-ruen.ru.en | 31.4 | 0.576 |
74
  | Tatoeba.ru.en | 61.1 | 0.736 |
75
 
76
+ ## Citation Information
77
+
78
+ ```bibtex
79
+ @InProceedings{TiedemannThottingal:EAMT2020,
80
+ author = {J{\"o}rg Tiedemann and Santhosh Thottingal},
81
+ title = {{OPUS-MT} — {B}uilding open translation services for the {W}orld},
82
+ booktitle = {Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)},
83
+ year = {2020},
84
+ address = {Lisbon, Portugal}
85
+ }
86
+ ```
87
+
88
+ ## How to Get Started With the Model
89
+
90
+ ```python
91
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
92
+
93
+ tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-ru-en")
94
+
95
+ model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-ru-en")
96
+ ```