metadata

license: apache-2.0
language: Spanish Nahuatl
tags:
  - translation Spanish Nahuatl

t5-small-spanish-nahuatl

Model description

This model is a T5 Transformer (t5-small) fine-tuned on 29,007 spanish and nahuatl sentences using 12890 samples collected from the web and 16117 samples from the Axolotl dataset.

Usage

from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained('hackathon-pln-es/t5-small-spanish-nahuatl')
tokenizer = AutoTokenizer.from_pretrained('hackathon-pln-es/t5-small-spanish-nahuatl')

model.eval()
sentence = 'muchas flores son blancas'
input_ids = tokenizer('translate Spanish to Nahuatl: ' + sentence, return_tensors='pt').input_ids
outputs = model.generate(input_ids)
# outputs = miak xochitl istak
outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

Evaluation results

The model is evaluated on 400 validation sentences.

Validation loss: 1.56
BLEU: 0.13

Note: Since the Axolotl corpus contains multiple misalignments, the real BLEU and Validation loss are slightly better.

References

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified Text-to-Text transformer.
Gutierrez-Vasques, X., Sierra, G., & Pompa, I. H. (2016). Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl. In LREC.

Created by Emilio Morales.