---
language:
- en
- es
- multilingual
tags:
- translation
datasets:
- ted_talks_iwslt
metrics:
- rouge
---

Subtitle Translation Model
===============

This is a model for text translation between Spanish and English texts. It has been trained with Spanish and English Ted Talks transcriptions from [ted_talks_iwslt](https://huggingface.co/datasets/ted_talks_iwslt), finetuning the [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) model.


#### Intended Use

This model has been trained with the intention of building a tool for subtitle translation.

#### Data

The dataset has been split into the following structure:

```python
DatasetDict({
    train: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 2454
    })
    validation: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
    test: Dataset({
        features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'],
        num_rows: 307
    })
})
```
Note: Evaluation numbers have been obtained using 50 samples from test set.

#### Relevant Training Arguments
```python
    evaluation_strategy = "epoch"
    learning_rate=2e-5
    per_device_train_batch_size=4
    per_device_eval_batch_size=4
    weight_decay=0.01
    save_total_limit=3
    num_train_epochs=1
    predict_with_generate=True
    fp16=False
```

#### Evaluation Results

The following results show the rouge metrics obtained during the training process (evaluation of the hiperparameters) and the evaluation of the model itself with the test set.

- Eval metrics 
```python
{'rouge1': 64.95, 'rouge2': 42.24, 'rougeL': 61.97, 'rougeLsum': 62.93}
```
- Test set evaluation (50 transcriptions)
```python
{'rouge1': 65.54,'rouge2': 41.45,'rougeL': 62.72,'rougeLsum': 62.83}
```
#### Using the model

This model can be easily used with the following lines of code:

```python
from transformers import pipeline
pipe = pipeline(model="razwand/opus-mt-en-mul-finetuned_en_sp_translator")
pipe("Hi everyone!")

>>[{'translation_text': 'Hola a todos!'}]

```