|
--- |
|
language: |
|
- ca |
|
- en |
|
|
|
tags: |
|
- translation |
|
|
|
library_name: opennmt |
|
license: mit |
|
metrics: |
|
- bleu |
|
--- |
|
|
|
### Introduction |
|
|
|
English - Catalan translation models based on OpenNMT. |
|
|
|
### Usage |
|
|
|
|
|
```pip3 install ctranslate2 pyonmttok``` |
|
|
|
Simple translation using Python: |
|
|
|
```python |
|
|
|
import ctranslate2 |
|
translator = ctranslate2.Translator("ctranslate2/") |
|
translator.translate_batch([["鈻丠ello", "鈻亀orld", "!"]]) |
|
[[{'tokens': ['鈻丠ola', '鈻乵贸n', '!']}]] |
|
|
|
``` |
|
|
|
Simple tokenization & translation using Python: |
|
|
|
|
|
```python |
|
|
|
import pyonmttok |
|
tokenizer=pyonmttok.Tokenizer(mode="none", sp_model_path = "tokenizer/sp_m.model") |
|
tokenized=tokenizer.tokenize("Hello world!") |
|
|
|
import ctranslate2 |
|
translator = ctranslate2.Translator("ctranslate2/") |
|
translated = translator.translate_batch([tokenized[0]]) |
|
print(tokenizer.detokenize(translated[0][0]['tokens'])) |
|
Hola m贸n! |
|
``` |
|
|
|
## Benchmarks |
|
|
|
| testset | BLEU | |
|
|-----------------------|-------| |
|
| Tatoeba-test.zho.eng | 45.2 | |
|
|
|
|