NMT-EN-FR-CT2 / README.md
ymoslem's picture
Update README.md
431d81b verified
---
language:
- fr
- en
metrics:
- bleu
pipeline_tag: translation
model-index:
- name: NMT-EN-FR
results:
- task:
type: translation
dataset:
name: UN Corpus
type: bilingual
metrics:
- name: BLEU
type: BLEU
value: 49
library_name: ctranslate2
license: cc-by-sa-4.0
---
# Model Details
French-to-English Machine Translation model trained by Yasmin Moslem.
This model depends on the Transformer (base) architecture.
The model was originally trained with OpenNMT-py and then converted to the CTranslate2 format for efficient inference.
## Tools
- OpenNMT-py
- CTranslate2
## Data
This model is trained on the French-to-English portion of the [UN Corpus](https://conferences.unite.un.org/UNCorpus/),
consisting of approx. 20 million segments.
## Tokenizer
The tokenizer was trained using [SentencePiece](https://github.com/google/sentencepiece) on shared vocabulary.
Hence, there is only one SentencePiece model that can be used for tokenizing both the source and target texts.
## Demo
A demo of this model is available at: https://www.machinetranslation.io/
The demo also illustrates word-level auto-suggestions with teacher forcing.
## Inference
If you want to run this model locally, you can use the [CTranslate2](https://github.com/OpenNMT/CTranslate2) library.
## Citation
```
@inproceedings{moslem-etal-2022-translation,
title = "Translation Word-Level Auto-Completion: What Can We Achieve Out of the Box?",
author = "Moslem, Yasmin and
Haque, Rejwanul and
Way, Andy",
booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.wmt-1.119",
pages = "1176--1181",
}
```