Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Roberta-Large

This repo trains roberta-large from scratch on the Norwegian training subset of Oscar containing roughly 4.7 GB of data.

A ByteLevelBPETokenizer as shown in this blog post was trained on the whole Norwegian training subset of Oscar.

Training is done on a TPUv3-8 in Flax. The training script as well as the script to create a tokenizer are attached below.

Run 1

--weight_decay="0.01"
--max_seq_length="128"
--train_batch_size="1048"
--eval_batch_size="1048"
--learning_rate="1e-3"
--warmup_steps="2000"
--pad_to_max_length
--num_train_epochs="12"
--adam_beta1="0.9"
--adam_beta2="0.98"

Trained for 12 epochs with each epoch including 8005 steps => Total of 96K steps. 1 epoch + eval takes roughly 2 hours 40 minutes => trained in total for 1 day and 8 hours. Final loss was 3.695.

Acc:

Acc

Loss:

Loss

Run 2

--weight_decay="0.01"
--max_seq_length="128"
--train_batch_size="1048"
--eval_batch_size="1048"
--learning_rate="5e-3"
--warmup_steps="2000"
--pad_to_max_length
--num_train_epochs="7"
--adam_beta1="0.9"
--adam_beta2="0.98"

Trained for 7 epochs with each epoch including 8005 steps => Total of 96K steps. 1 epoch + eval takes roughly 2 hours 40 minutes => trained in total for 18 hours. Final loss was 2.216 and accuracy 0.58.

Acc:

Acc

Loss:

Loss

Downloads last month
6