jarodrigues
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,7 @@ tags:
|
|
22 |
- foundation model
|
23 |
datasets:
|
24 |
- PORTULAN/glue-ptpt
|
|
|
25 |
---
|
26 |
</br>
|
27 |
</br>
|
@@ -82,7 +83,7 @@ Gervásio-7B-PTPT-Decoder is distributed under an [MIT license](https://huggingf
|
|
82 |
|
83 |
# Training Data
|
84 |
|
85 |
-
**Gervásio 7B PT-PT** over standard supervised fine-tuning, and to keep some alignment with mainstream benchmarks for English, we resorted to tasks and respective datasets in the GLUE and the SuperGLUE collections.
|
86 |
|
87 |
|
88 |
We selected those datasets where the outcome of their machine translation into Portuguese could preserve, in the target language, the linguistic properties at stake.
|
@@ -102,11 +103,11 @@ And from SuperGLUE, we included these other four tasks:
|
|
102 |
|
103 |
Instruction templates have been manually crafted for each task.
|
104 |
These take the various fields in the dataset and arrange them into a prompt.
|
105 |
-
These templates are listed in full detail in
|
106 |
|
107 |
# Training Details
|
108 |
|
109 |
-
We applied supervised fine-tuning with causal language modeling (CLM) training objective
|
110 |
Specifically, while the entire prompt received attention during fine-tuning, only the response tokens were subjected to back-propagation.
|
111 |
|
112 |
In terms of hyper-parameters, both models were trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two-epoch training regime without warm-up, and to ensure the same number of tokens back-propagated per step, we employed an input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps.
|
@@ -139,7 +140,7 @@ You can use this model directly with a pipeline for causal language modeling (CL
|
|
139 |
|
140 |
```python3
|
141 |
>>> from transformers import pipeline
|
142 |
-
>>> generator = pipeline(model='PORTULAN/gervasio-ptpt-decoder')
|
143 |
>>> generator("A música portuguesa é", max_new_tokens=10)
|
144 |
[{'generated_text': 'A música portuguesa é uma das mais ricas do mundo'}]
|
145 |
|
@@ -156,4 +157,4 @@ grant PINFRA/22117/2016; research project GPT-PT - Transformer-based Decoder for
|
|
156 |
grant CPCA-IAC/AV/478395/2022; innovation project
|
157 |
ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação
|
158 |
under the grant C625734525-00462629, of Plano de Recuperação e Resiliência,
|
159 |
-
call RE-C05-i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização.
|
|
|
22 |
- foundation model
|
23 |
datasets:
|
24 |
- PORTULAN/glue-ptpt
|
25 |
+
- PORTULAN/extraglue
|
26 |
---
|
27 |
</br>
|
28 |
</br>
|
|
|
83 |
|
84 |
# Training Data
|
85 |
|
86 |
+
**Gervásio 7B PT-PT** was trained over standard supervised fine-tuning, and to keep some alignment with mainstream benchmarks for English, we resorted to tasks and respective datasets in the GLUE and the SuperGLUE collections.
|
87 |
|
88 |
|
89 |
We selected those datasets where the outcome of their machine translation into Portuguese could preserve, in the target language, the linguistic properties at stake.
|
|
|
103 |
|
104 |
Instruction templates have been manually crafted for each task.
|
105 |
These take the various fields in the dataset and arrange them into a prompt.
|
106 |
+
These templates are listed in full detail in the [Extraglue dataset](https://huggingface.co/datasets/PORTULAN/extraglue).
|
107 |
|
108 |
# Training Details
|
109 |
|
110 |
+
We applied supervised fine-tuning with a causal language modeling (CLM) training objective following a zero-out technique during the fine-tuning process.
|
111 |
Specifically, while the entire prompt received attention during fine-tuning, only the response tokens were subjected to back-propagation.
|
112 |
|
113 |
In terms of hyper-parameters, both models were trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two-epoch training regime without warm-up, and to ensure the same number of tokens back-propagated per step, we employed an input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps.
|
|
|
140 |
|
141 |
```python3
|
142 |
>>> from transformers import pipeline
|
143 |
+
>>> generator = pipeline(model='PORTULAN/gervasio-7b-portuguese-ptpt-decoder')
|
144 |
>>> generator("A música portuguesa é", max_new_tokens=10)
|
145 |
[{'generated_text': 'A música portuguesa é uma das mais ricas do mundo'}]
|
146 |
|
|
|
157 |
grant CPCA-IAC/AV/478395/2022; innovation project
|
158 |
ACCELERAT.AI - Multilingual Intelligent Contact Centers, funded by IAPMEI, I.P. - Agência para a Competitividade e Inovação
|
159 |
under the grant C625734525-00462629, of Plano de Recuperação e Resiliência,
|
160 |
+
call RE-C05-i01.01 – Agendas/Alianças Mobilizadoras para a Reindustrialização.
|