eduagarcia commited on
Commit
338b0a8
1 Parent(s): d789b2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -129,7 +129,7 @@ With sufficient pre-training data, it can surpass larger models. The results hig
129
 
130
  ## Training Details
131
 
132
- RoBERTaLexPT-base is pretrained from both data:
133
  - [LegalPT](https://huggingface.co/datasets/eduagarcia/LegalPT_dedup) is a Portuguese legal corpus by aggregating diverse sources of up to 125GiB data.
134
  - [CrawlPT](https://huggingface.co/datasets/eduagarcia/CrawlPT_dedup) is a composition of three Portuguese general corpora: [brWaC](https://huggingface.co/datasets/brwac), [CC100 PT subset](https://huggingface.co/datasets/eduagarcia/cc100-pt), [OSCAR-2301 PT subset](https://huggingface.co/datasets/eduagarcia/OSCAR-2301-pt_dedup).
135
 
@@ -149,10 +149,10 @@ To ensure that domain models are not constrained by a generic vocabulary, we uti
149
 
150
  #### Training Hyperparameters
151
 
152
- The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 and a learning rate of 4e-4, each sequence containing a maximum of 512 tokens.
153
- The weight initialization is random.
154
- We employed the masked language modeling objective, where 15\% of the input tokens were randomly masked.
155
- The optimization was performed using the AdamW optimizer with a linear warmup and a linear decay learning rate schedule.
156
 
157
  For other parameters we adopted the standard [RoBERTa-base hyperparameters](https://huggingface.co/FacebookAI/roberta-base):
158
 
 
129
 
130
  ## Training Details
131
 
132
+ RoBERTaLexPT-base is pretrained on:
133
  - [LegalPT](https://huggingface.co/datasets/eduagarcia/LegalPT_dedup) is a Portuguese legal corpus by aggregating diverse sources of up to 125GiB data.
134
  - [CrawlPT](https://huggingface.co/datasets/eduagarcia/CrawlPT_dedup) is a composition of three Portuguese general corpora: [brWaC](https://huggingface.co/datasets/brwac), [CC100 PT subset](https://huggingface.co/datasets/eduagarcia/cc100-pt), [OSCAR-2301 PT subset](https://huggingface.co/datasets/eduagarcia/OSCAR-2301-pt_dedup).
135
 
 
149
 
150
  #### Training Hyperparameters
151
 
152
+ The pretraining process involved training the model for 62,500 steps, with a batch size of 2048 and a learning rate of 4e-4, each sequence containing a maximum of 512 tokens.
153
+ The weight initialization is random.
154
+ We employed the masked language modeling objective, where 15\% of the input tokens were randomly masked.
155
+ The optimization was performed using the AdamW optimizer with a linear warmup and a linear decay learning rate schedule.
156
 
157
  For other parameters we adopted the standard [RoBERTa-base hyperparameters](https://huggingface.co/FacebookAI/roberta-base):
158