This is a translation model using Marian-NMT. For more details, please see my repository.

In addition to the data listed in the repository I also used ParaCrawl.

  • source languages: de, en, es, fr, it, ru, uk
  • target language: ja

How to use

This model uses transformers and sentencepiece.

!pip install transformers sentencepiece

You can use this model directly with a pipeline:

from transformers import pipeline
tako_translator = pipeline('translation', model='staka/takomt')
tako_translator('This is a cat.')

Eval results

The results of the evaluation using tatoeba(randomly selected 500 sentences) are as follows:

source target BLEU(*1)
de ja 27.8
en ja 28.4
es ja 32.0
fr ja 27.9
it ja 24.3
ru ja 27.3
uk ja 29.8

(*1) sacrebleu --tokenize ja-mecab

