--- language: - en - es - multilingual tags: - translation datasets: - ted_talks_iwslt metrics: - rouge --- Subtitle Translation Model =============== This is a model for text translation between Spanish and English texts. It has been trained with Spanish and English Ted Talks transcriptions from [ted_talks_iwslt](https://huggingface.co/datasets/ted_talks_iwslt), finetuning the [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) model. #### Intended Use This model has been trained with the intention of building a tool for subtitle translation. #### Data The dataset has been split into the following structure: ```python DatasetDict({ train: Dataset({ features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'], num_rows: 2454 }) validation: Dataset({ features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'], num_rows: 307 }) test: Dataset({ features: ['Original_Sentence', 'Translate_SP', '__index_level_0__'], num_rows: 307 }) }) ``` Note: Evaluation numbers have been obtained using 50 samples from test set. #### Relevant Training Arguments ```python evaluation_strategy = "epoch" learning_rate=2e-5 per_device_train_batch_size=4 per_device_eval_batch_size=4 weight_decay=0.01 save_total_limit=3 num_train_epochs=1 predict_with_generate=True fp16=False ``` #### Evaluation Results The following results show the rouge metrics obtained during the training process (evaluation of the hiperparameters) and the evaluation of the model itself with the test set. - Eval metrics ```python {'rouge1': 64.95, 'rouge2': 42.24, 'rougeL': 61.97, 'rougeLsum': 62.93} ``` - Test set evaluation (50 transcriptions) ```python {'rouge1': 65.54,'rouge2': 41.45,'rougeL': 62.72,'rougeLsum': 62.83} ``` #### Using the model This model can be easily used with the following lines of code: ```python from transformers import pipeline pipe = pipeline(model="razwand/opus-mt-en-mul-finetuned_en_sp_translator") pipe("Hi everyone!") >>[{'translation_text': 'Hola a todos!'}] ```