alirezamsh commited on
Commit
e98aa74
1 Parent(s): 1a62b9a

add instruction

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -112,3 +112,36 @@ datasets:
112
  - gmnlp/tico19
113
  - tatoeba
114
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  - gmnlp/tico19
113
  - tatoeba
114
  ---
115
+
116
+ # SMALL-100 Model
117
+
118
+ SMaLL-100 is a compact and fast massively multilingual machine translation model covering more than 10K language pairs, that achieves competitive results with M2M-100 while being much smaller and faster. It is introduced in [this paper](https://arxiv.org/abs/2210.11621), and initially released in [this repository](https://github.com/alirezamshi/small100).
119
+
120
+ The model architecture and config are the same as [M2M-100](https://huggingface.co/facebook/m2m100_418M/tree/main) implementation, but the tokenizer is modified to adjust language codes. So, you should load the tokenizer locally from tokenization_small100.py file for the moment.
121
+
122
+ ```
123
+ from transformers import M2M100ForConditionalGeneration
124
+ from tokenization_small100 import SMALL100Tokenizer
125
+
126
+ hi_text = "जीवन एक चॉकलेट बॉक्स की तरह है।"
127
+ chinese_text = "生活就像一盒巧克力。"
128
+
129
+ model = M2M100ForConditionalGeneration.from_pretrained("alirezamsh/small100")
130
+ tokenizer = SMALL100Tokenizer.from_pretrained("alirezamsh/small100")
131
+
132
+ # translate Hindi to French
133
+ tokenizer.tgt_lang = "fr"
134
+ encoded_hi = tokenizer(hi_text, return_tensors="pt")
135
+ generated_tokens = model.generate(**encoded_hi)
136
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
137
+ # => "La vie est comme une boîte de chocolat."
138
+
139
+ # translate Chinese to English
140
+ tokenizer.tgt_lang = "en"
141
+ encoded_zh = tokenizer(chinese_text, return_tensors="pt")
142
+ generated_tokens = model.generate(**encoded_zh)
143
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
144
+ # => "Life is like a box of chocolate."
145
+ ```
146
+
147
+ Please refer to [original repository](https://github.com/alirezamshi/small100) for further details.