File size: 1,266 Bytes
75d3ae1
 
5309065
 
 
 
 
 
 
 
75d3ae1
5309065
 
 
 
f1f3e8f
5309065
 
 
 
 
 
50af04f
 
 
 
5309065
50af04f
5309065
 
dd61774
 
bf625f6
 
2d6da1c
 
 
 
 
 
 
 
 
 
 
 
 
 
bf625f6
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: cc-by-sa-4.0

language: 
- en
- ja

tags:
- translation

---

# FuguMT

This is a translation model using Marian-NMT.
For more details, please see [my repository](https://github.com/s-taka/fugumt).

* source language: en
* target language: ja 

### How to use

This model uses transformers and sentencepiece.
```python
!pip install transformers sentencepiece
```

You can use this model directly with a pipeline:
```python
from transformers import pipeline
fugu_translator = pipeline('translation', model='staka/fugumt-en-ja')
fugu_translator('This is a cat.')
```

If you want to translate multiple sentences, we recommend using [pySBD](https://github.com/nipunsadvilkar/pySBD).
```python
!pip install transformers sentencepiece pysbd

import pysbd
seg_en = pysbd.Segmenter(language="en", clean=False)

from transformers import pipeline
fugu_translator = pipeline('translation', model='staka/fugumt-en-ja')
txt = 'This is a cat. It is very cute.'
print(fugu_translator(seg_en.segment(txt)))
```


### Eval results

The results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows:

|source |target |BLEU(*1)| 
|-------|-------|--------|
|en     |ja     |32.7    |

(*1) sacrebleu --tokenize ja-mecab