Ezi commited on
Commit
ffbc53c
β€’
1 Parent(s): b7d1e2a

Model Card

Browse files

Hi!πŸ‘‹
This PR has a some additional information for the model card, based on the format we are using as part of our effort to standardise model cards at Hugging Face. Feel free to merge if you are ok with the changes! (cc

@Marissa



@Meg

)

Files changed (1) hide show
  1. README.md +64 -54
README.md CHANGED
@@ -11,87 +11,97 @@ license: apache-2.0
11
 
12
  ### zho-eng
13
 
14
- * source group: Chinese
15
- * target group: English
16
- * OPUS readme: [zho-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/zho-eng/README.md)
17
-
18
- * model: transformer
19
- * source language(s): cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant gan lzh lzh_Hans nan wuu yue yue_Hans yue_Hant
20
- * target language(s): eng
21
- * model: transformer
22
- * pre-processing: normalization + SentencePiece (spm32k,spm32k)
23
- * download original weights: [opus-2020-07-17.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.zip)
24
- * test set translations: [opus-2020-07-17.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.test.txt)
25
- * test set scores: [opus-2020-07-17.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.eval.txt)
26
-
27
- ## Benchmarks
28
-
29
- | testset | BLEU | chr-F |
30
- |-----------------------|-------|-------|
31
- | Tatoeba-test.zho.eng | 36.1 | 0.548 |
32
-
33
 
34
- ### System Info:
35
- - hf_name: zho-eng
 
 
 
 
 
 
 
 
36
 
37
- - source_languages: zho
38
 
39
- - target_languages: eng
40
 
41
- - opus_readme_url: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/zho-eng/README.md
42
 
43
- - original_repo: Tatoeba-Challenge
44
 
45
- - tags: ['translation']
46
 
47
- - languages: ['zh', 'en']
48
 
49
- - src_constituents: {'cmn_Hans', 'nan', 'nan_Hani', 'gan', 'yue', 'cmn_Kana', 'yue_Hani', 'wuu_Bopo', 'cmn_Latn', 'yue_Hira', 'cmn_Hani', 'cjy_Hans', 'cmn', 'lzh_Hang', 'lzh_Hira', 'cmn_Hant', 'lzh_Bopo', 'zho', 'zho_Hans', 'zho_Hant', 'lzh_Hani', 'yue_Hang', 'wuu', 'yue_Kana', 'wuu_Latn', 'yue_Bopo', 'cjy_Hant', 'yue_Hans', 'lzh', 'cmn_Hira', 'lzh_Yiii', 'lzh_Hans', 'cmn_Bopo', 'cmn_Hang', 'hak_Hani', 'cmn_Yiii', 'yue_Hant', 'lzh_Kana', 'wuu_Hani'}
50
 
51
- - tgt_constituents: {'eng'}
52
 
53
- - src_multilingual: False
54
 
55
- - tgt_multilingual: False
56
 
57
- - prepro: normalization + SentencePiece (spm32k,spm32k)
 
 
 
 
 
 
58
 
59
- - url_model: https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.zip
60
-
61
- - url_test_set: https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.test.txt
62
-
63
- - src_alpha3: zho
 
64
 
65
- - tgt_alpha3: eng
66
 
67
- - short_pair: zh-en
68
 
69
- - chrF2_score: 0.5479999999999999
70
 
71
- - bleu: 36.1
72
 
73
- - brevity_penalty: 0.948
74
 
75
- - ref_len: 82826.0
76
 
77
- - src_name: Chinese
78
 
79
- - tgt_name: English
80
 
81
- - train_date: 2020-07-17
 
 
82
 
83
- - src_alpha2: zh
84
 
85
- - tgt_alpha2: en
 
 
 
 
 
 
 
 
86
 
87
- - prefer_old: False
88
 
89
- - long_pair: zho-eng
 
90
 
91
- - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
92
 
93
- - transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
 
94
 
95
- - port_machine: brutasse
96
 
97
- - port_time: 2020-08-21-14:41
 
11
 
12
  ### zho-eng
13
 
14
+ ## Table of Contents
15
+ - [Model Details](#model-details)
16
+ - [Uses](#uses)
17
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
18
+ - [Training](#training)
19
+ - [Evaluation](#evaluation)
20
+ - [Citation Information](#citation-information)
21
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
 
 
 
 
 
 
 
 
 
 
 
22
 
23
+ ## Model Details
24
+ - **Model Description:**
25
+ - **Developed by:** Language Technology Research Group at the University of Helsinki
26
+ - **Model Type:** Translation
27
+ - **Language(s):**
28
+ - Source Language: Chinese
29
+ - Target Language: English
30
+ - **License:** Apache-2.0
31
+ - **Resources for more information:**
32
+ - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)
33
 
 
34
 
35
+ ## Uses
36
 
37
+ #### Direct Use
38
 
39
+ This model can be used for translation and text-to-text generation.
40
 
 
41
 
42
+ ## Risks, Limitations and Biases
43
 
44
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
45
 
46
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
47
 
48
+ Further details about the dataset for this model can be found in the OPUS readme: [zho-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/zho-eng/README.md)
49
 
50
+ ## Training
51
 
52
+ #### System Information
53
+ * helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
54
+ * transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
55
+ * port_machine: brutasse
56
+ * port_time: 2020-08-21-14:41
57
+ * src_multilingual: False
58
+ * tgt_multilingual: False
59
 
60
+ #### Training Data
61
+ ##### Preprocessing
62
+ * pre-processing: normalization + SentencePiece (spm32k,spm32k)
63
+ * ref_len: 82826.0
64
+ * dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT)
65
+ * download original weights: [opus-2020-07-17.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.zip)
66
 
67
+ * test set translations: [opus-2020-07-17.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.test.txt)
68
 
 
69
 
70
+ ## Evaluation
71
 
72
+ #### Results
73
 
74
+ * test set scores: [opus-2020-07-17.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.eval.txt)
75
 
76
+ * brevity_penalty: 0.948
77
 
 
78
 
79
+ ## Benchmarks
80
 
81
+ | testset | BLEU | chr-F |
82
+ |-----------------------|-------|-------|
83
+ | Tatoeba-test.zho.eng | 36.1 | 0.548 |
84
 
85
+ ## Citation Information
86
 
87
+ ```bibtex
88
+ @InProceedings{TiedemannThottingal:EAMT2020,
89
+ author = {J{\"o}rg Tiedemann and Santhosh Thottingal},
90
+ title = {{OPUS-MT} β€” {B}uilding open translation services for the {W}orld},
91
+ booktitle = {Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT)},
92
+ year = {2020},
93
+ address = {Lisbon, Portugal}
94
+ }
95
+ ```
96
 
97
+ ## How to Get Started With the Model
98
 
99
+ ```python
100
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
101
 
102
+ tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
103
 
104
+ model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
105
+ ```
106
 
 
107