File size: 5,746 Bytes
2c94fc9
8081d7d
2c94fc9
 
 
 
 
 
 
 
8081d7d
 
9df1f01
8081d7d
2c94fc9
 
 
 
 
 
e2f8c8c
2c94fc9
 
e2f8c8c
2c94fc9
e2f8c8c
2c94fc9
e2f8c8c
 
 
 
 
 
2c94fc9
e2f8c8c
8081d7d
e2f8c8c
2c94fc9
e2f8c8c
 
 
 
 
2c94fc9
8081d7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c94fc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2f8c8c
 
 
 
 
2c94fc9
 
 
 
 
 
8081d7d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
license: apache-2.0
base_model: Helsinki-NLP/opus-mt-tc-big-en-pt
tags:
- generated_from_trainer
datasets:
- kde4
model-index:
- name: opus-en-to-pt-translate
  results: []
language:
- pt
- en
pipeline_tag: translation
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

## Model description
This model is a fine tuning for translations from English to Portuguese.


## How to Use

```python

prompt = f"""
Trump received a Bachelor of Science in economics from the University of Pennsylvania in 1968, and his father named him president of his real estate business in 1971.
Trump renamed it the Trump Organization and reoriented the company toward building and renovating skyscrapers, hotels, casinos, and golf courses.
After a series of business failures in the late twentieth century, he successfully launched side ventures that required little capital, mostly by licensing the Trump name.
From 2004 to 2015, he co-produced and hosted the reality television series The Apprentice.
"""

from transformers import pipeline
pipe = pipeline("translation", model="rhaymison/opus-en-to-pt-translator")
print(pipe())

#Trump recebeu um título de bacharel em economia pela Universidade da Pensilvânia em 1968, e o seu pai deu- lhe o nome de presidente do seu negócio imobiliário em 1971.
#Trump mudou o nome para Organização Trump e voltou a orientar a companhia para a construção e reforma de arranha- céus, hotéis, casinos e campos de golfe.
#Depois de uma série de falhas de negócio no fim do século XX, ele lançou com sucesso projectos paralelos que necessitaram de pouca capital, principalmente através do
#licenciamento do nome Trump. De 2004 a 2015, ele produziu em conjunto e alojou a série de reality série The Apprentice.
```

```python

from transformers import MarianMTModel, MarianTokenizer

texts = [
    ">>por<< Trump received a Bachelor of Science in economics from the University of Pennsylvania in 1968, and his father named him president of his real estate business in 1971.",
    ">>por<< Trump renamed it the Trump Organization and reoriented the company toward building and renovating skyscrapers, hotels, casinos, and golf courses."
]

model = "rhaymison/opus-en-to-pt-translator"
tokenizer = MarianTokenizer.from_pretrained(model)
model = MarianMTModel.from_pretrained(model)
translated = model.generate(**tokenizer(texts, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# output:
# Trump recebeu um título de bacharel em economia pela Universidade da Pensilvânia em 1968, e o seu pai deu- lhe o nome de presidente do seu negócio imobiliário em 1971..
# Trump mudou o nome para Organização Trump e voltou a orientar a companhia para a construção e reforma de arranha- céus, hotéis, casinos e campos de golfe.


```

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 0.4982        | 0.08  | 500   | 0.6398          |
| 0.5475        | 0.15  | 1000  | 0.6370          |
| 0.5397        | 0.23  | 1500  | 0.6333          |
| 0.5267        | 0.31  | 2000  | 0.6272          |
| 0.5212        | 0.39  | 2500  | 0.6240          |
| 0.522         | 0.46  | 3000  | 0.6179          |
| 0.5213        | 0.54  | 3500  | 0.6124          |
| 0.5155        | 0.62  | 4000  | 0.6114          |
| 0.5143        | 0.7   | 4500  | 0.6053          |
| 0.5037        | 0.77  | 5000  | 0.6058          |
| 0.5093        | 0.85  | 5500  | 0.6002          |
| 0.5253        | 0.93  | 6000  | 0.5945          |
| 0.5138        | 1.01  | 6500  | 0.5892          |
| 0.4864        | 1.08  | 7000  | 0.5906          |
| 0.491         | 1.16  | 7500  | 0.5889          |
| 0.4993        | 1.24  | 8000  | 0.5849          |
| 0.4749        | 1.32  | 8500  | 0.5849          |
| 0.4911        | 1.39  | 9000  | 0.5812          |
| 0.487         | 1.47  | 9500  | 0.5796          |
| 0.4846        | 1.55  | 10000 | 0.5758          |
| 0.4863        | 1.63  | 10500 | 0.5739          |
| 0.4792        | 1.7   | 11000 | 0.5725          |
| 0.4816        | 1.78  | 11500 | 0.5704          |
| 0.4811        | 1.86  | 12000 | 0.5684          |
| 0.4773        | 1.94  | 12500 | 0.5676          |
| 0.4657        | 2.01  | 13000 | 0.5691          |
| 0.4246        | 2.09  | 13500 | 0.5683          |
| 0.4285        | 2.17  | 14000 | 0.5693          |
| 0.4241        | 2.25  | 14500 | 0.5676          |
| 0.422         | 2.32  | 15000 | 0.5669          |
| 0.4199        | 2.4   | 15500 | 0.5656          |
| 0.4273        | 2.48  | 16000 | 0.5650          |
| 0.4161        | 2.56  | 16500 | 0.5651          |
| 0.4243        | 2.63  | 17000 | 0.5635          |
| 0.4202        | 2.71  | 17500 | 0.5628          |
| 0.4152        | 2.79  | 18000 | 0.5627          |
| 0.4179        | 2.87  | 18500 | 0.5619          |
| 0.4241        | 2.94  | 19000 | 0.5618          |

# opus-en-to-pt-translate

This model is a fine-tuned version of [Helsinki-NLP/opus-mt-tc-big-en-pt](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) on the kde4 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5618

### Framework versions

- Transformers 4.38.1
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2