File size: 2,336 Bytes
77d837e
d89dcc9
2c39971
927b35e
2c39971
 
 
 
 
 
 
 
d89dcc9
 
 
 
 
 
 
233e751
d89dcc9
 
 
233e751
 
 
d89dcc9
233e751
d89dcc9
 
233e751
d89dcc9
233e751
d89dcc9
 
 
 
 
 
233e751
 
d89dcc9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
233e751
 
 
7782187
 
233e751
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: mit
datasets: thegoodfellas/brwac_tiny
widget:
- text: Demanda por fundos de <mask> para crianças cresce em 2022
  example_title: Exemplo 1
- text: Havia uma <mask> no meio do caminho
  example_title: Exemplo 2
- text: Na verdade, começar a <mask> cedo é ideal para ter um bom dinheiro no futuro
  example_title: Exemplo 3
- text: Mitos e verdades sobre o <mask>. Doença que mais mata mulheres no Brasil.
  example_title: Exemplo 4
model-index:
- name: tgf-xlm-roberta-base-pt-br
  results: []
---

# tgf-xlm-roberta-base-pt-br

This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [BrWac](https://huggingface.co/datasets/thegoodfellas/brwac_tiny) dataset.

## Model description

This is a fine-tuned version of the Brazilian Portuguese language. It was trained using the [BrWac](https://huggingface.co/datasets/thegoodfellas/brwac_tiny) dataset and followed the principles from [Roberta's paper](https://arxiv.org/abs/1907.11692). The key strategies are:

1. *Full-Sentences*: Quoted from the paper: "Each input is packed with full sentences sampled contiguously from one or more documents, such that the total length is at most 512 tokens. Inputs may cross document boundaries. When we reach the end of one document, we begin sampling sentences from the next document and add an extra separator token between documents".

2. Tunned hyperparameters: adam_beta1=0.9, adam_beta2=0.98, adam_epsilon=1e-6 (as paper suggests)


## Availability

The source code is available [here](https://github.com/the-good-fellas/xlm-roberta-pt-br)

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-4
- train_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 2
- mixed_precision_training: Native AMP

### Framework versions

- Transformers 4.23.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.6.1
- Tokenizers 0.13.1

### Environment

4xA100.88V NVIDIA

Special thanks to [DataCrunch.io](https://datacrunch.io) with their amazing, and affordable GPUs.
<img src="https://datacrunch.io/_next/static/media/Logo.6b773500.svg"  width="20%"/>