NMT-EN-FR-CT2 / README.md
ymoslem's picture
Update README.md
431d81b verified
metadata
language:
  - fr
  - en
metrics:
  - bleu
pipeline_tag: translation
model-index:
  - name: NMT-EN-FR
    results:
      - task:
          type: translation
        dataset:
          name: UN Corpus
          type: bilingual
        metrics:
          - name: BLEU
            type: BLEU
            value: 49
library_name: ctranslate2
license: cc-by-sa-4.0

Model Details

French-to-English Machine Translation model trained by Yasmin Moslem. This model depends on the Transformer (base) architecture. The model was originally trained with OpenNMT-py and then converted to the CTranslate2 format for efficient inference.

Tools

  • OpenNMT-py
  • CTranslate2

Data

This model is trained on the French-to-English portion of the UN Corpus, consisting of approx. 20 million segments.

Tokenizer

The tokenizer was trained using SentencePiece on shared vocabulary. Hence, there is only one SentencePiece model that can be used for tokenizing both the source and target texts.

Demo

A demo of this model is available at: https://www.machinetranslation.io/

The demo also illustrates word-level auto-suggestions with teacher forcing.

Inference

If you want to run this model locally, you can use the CTranslate2 library.

Citation

@inproceedings{moslem-etal-2022-translation,
    title = "Translation Word-Level Auto-Completion: What Can We Achieve Out of the Box?",
    author = "Moslem, Yasmin  and
      Haque, Rejwanul  and
      Way, Andy",
    booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.wmt-1.119",
    pages = "1176--1181",
}