cstorm125 commited on
Commit
9c1539b
1 Parent(s): d9491a8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - translation
4
+ - torch==1.8.0
5
+ ---
6
+ ### marianmt-zh_cn-th
7
+ * source languages: zh_cn
8
+ * target languages: th
9
+ * dataset:
10
+ * model: transformer-align
11
+ * pre-processing: normalization + SentencePiece
12
+ * test set translations:
13
+ * test set scores:
14
+
15
+ ## Training
16
+
17
+ ```
18
+ export WANDB_PROJECT=marianmt-zh_cn-th
19
+ python train_model.py --input_fname ../data/v1/Train.csv \
20
+ --output_dir ../models/marianmt-zh_cn-th \
21
+ --source_lang zh --target_lang th \
22
+ --metric_tokenize th_syllable --fp16 --batch_size 64
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ ```
28
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
29
+
30
+ tokenizer = AutoTokenizer.from_pretrained("cstorm125/marianmt-zh_cn-th")
31
+ model = AutoModelForSeq2SeqLM.from_pretrained("cstorm125/marianmt-zh_cn-th").cpu()
32
+
33
+ src_text = [
34
+ '我爱你',
35
+ '我想吃米饭',
36
+ ]
37
+ translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
38
+ print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])
39
+
40
+ > ['ผมรักคุณนะ', 'ฉันอยากกินข้าว']
41
+ ```
42
+
43
+ ## Requirements
44
+ ```
45
+ transformers==4.6.0
46
+ torch==1.8.0
47
+ ```