dorinalakatos commited on
Commit
c76cf4b
1 Parent(s): 59272f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -1,3 +1,56 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+ language:
4
+ - en
5
+ - hu
6
+ tags:
7
+ - translation
8
+ - opennmt
9
+
10
+ inference: false
11
  ---
12
+
13
+ ### Introduction
14
+
15
+ English - Hungarian translation model that was trained on the [Hunglish2](http://mokk.bme.hu/resources/hunglishcorpus/) dataset using OpenNMT.
16
+
17
+ ### Usage
18
+
19
+ Install the necessary dependencies:
20
+
21
+ ```bash
22
+ pip3 install ctranslate2 pyonmttok
23
+ ```
24
+
25
+ Simple tokenization & translation using Python:
26
+
27
+
28
+ ```python
29
+ import ctranslate2
30
+ import pyonmttok
31
+ from huggingface_hub import snapshot_download
32
+ model_dir = snapshot_download(repo_id="SZTAKI-HLT/opennmt-en-hu", revision="main")
33
+
34
+ tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = model_dir + "/sp_m.model")
35
+ tokenized=tokenizer.tokenize("Hello világ")
36
+
37
+ translator = ctranslate2.Translator(model_dir)
38
+ translated = translator.translate_batch([tokenized[0]])
39
+ print(tokenizer.detokenize(translated[0].hypotheses[0]))
40
+ ```
41
+
42
+
43
+ ## Citation
44
+
45
+ If you use our model, please cite the following paper:
46
+ ```
47
+
48
+ @inproceedings{nagy2022syntax,
49
+ title={Syntax-based data augmentation for Hungarian-English machine translation},
50
+ author={Nagy, Attila and Nanys, Patrick and Konr{\'a}d, Bal{\'a}zs Frey and Bial, Bence and {\'A}cs, Judit},
51
+ booktitle = {XVIII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2022)},
52
+ year={2022},
53
+ publisher = {Szegedi Tudományegyetem, Informatikai Intézet},
54
+ }
55
+
56
+ ```