Subtitle Translation Model

This is a model for text translation between Spanish and English texts. It has been trained with Spanish and English Ted Talks transcriptions from ted_talks_iwslt, finetuning the Helsinki-NLP/opus-mt-en-mul model.

Intended Use

This model has been trained with the intention of building a tool for subtitle translation.

Data

The dataset has been split into the following structure:

DatasetDict({
    train: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 2454
    })
    validation: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
    test: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
})

Note: Evaluation numbers have been obtained using 50 samples from test set.

Relevant Training Arguments

    evaluation_strategy = "epoch"
    learning_rate=2e-5
    per_device_train_batch_size=4
    per_device_eval_batch_size=4
    weight_decay=0.01
    save_total_limit=3
    num_train_epochs=1
    predict_with_generate=True
    fp16=False

Evaluation Results

The following results show the rouge metrics obtained during the training process (evaluation of the hiperparameters) and the evaluation of the model itself with the test set.

  • Eval metrics
{'rouge1': 64.95, 'rouge2': 42.24, 'rougeL': 61.97, 'rougeLsum': 62.93}
  • Test set evaluation (50 transcriptions)
{'rouge1': 65.54,'rouge2': 41.45,'rougeL': 62.72,'rougeLsum': 62.83}

Using the model

This model can be easily used with the following lines of code:

from transformers import pipeline
pipe = pipeline(model="razwand/opus-mt-en-mul-finetuned_en_sp_translator")
pipe("Hi everyone!")

>>[{'translation_text': 'Hola a todos!'}]
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train razwand/opus-mt-en-mul-finetuned_en_sp_translator