File size: 1,266 Bytes
75d3ae1 5309065 75d3ae1 5309065 f1f3e8f 5309065 50af04f 5309065 50af04f 5309065 dd61774 bf625f6 2d6da1c bf625f6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: cc-by-sa-4.0
language:
- en
- ja
tags:
- translation
---
# FuguMT
This is a translation model using Marian-NMT.
For more details, please see [my repository](https://github.com/s-taka/fugumt).
* source language: en
* target language: ja
### How to use
This model uses transformers and sentencepiece.
```python
!pip install transformers sentencepiece
```
You can use this model directly with a pipeline:
```python
from transformers import pipeline
fugu_translator = pipeline('translation', model='staka/fugumt-en-ja')
fugu_translator('This is a cat.')
```
If you want to translate multiple sentences, we recommend using [pySBD](https://github.com/nipunsadvilkar/pySBD).
```python
!pip install transformers sentencepiece pysbd
import pysbd
seg_en = pysbd.Segmenter(language="en", clean=False)
from transformers import pipeline
fugu_translator = pipeline('translation', model='staka/fugumt-en-ja')
txt = 'This is a cat. It is very cute.'
print(fugu_translator(seg_en.segment(txt)))
```
### Eval results
The results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows:
|source |target |BLEU(*1)|
|-------|-------|--------|
|en |ja |32.7 |
(*1) sacrebleu --tokenize ja-mecab |