Lao RoBERTa Base
Lao RoBERTa Base is a masked language model based on the RoBERTa model. It was trained on the OSCAR-2109 dataset, specifically the deduplicated_lo
subset. The model was trained from scratch and achieved an evaluation loss of 1.4556 and an evaluation perplexity of 4.287.
This model was trained using HuggingFace's PyTorch framework and the training script found here. All training was done on a TPUv3-8, provided by the TPU Research Cloud program. You can view the detailed training results in the Training metrics tab, logged via Tensorboard.
Model
Model | #params | Arch. | Training/Validation data (text) |
---|---|---|---|
lao-roberta-base |
124M | RoBERTa | OSCAR-2109 deduplicated_lo Dataset |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- distributed_type: tpu
- num_devices: 8
- total_train_batch_size: 1024
- total_eval_batch_size: 1024
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 30.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 1.0 | 216 | 5.8586 |
No log | 2.0 | 432 | 5.5095 |
6.688 | 3.0 | 648 | 5.3976 |
6.688 | 4.0 | 864 | 5.3562 |
5.3629 | 5.0 | 1080 | 5.2912 |
5.3629 | 6.0 | 1296 | 5.2385 |
5.22 | 7.0 | 1512 | 5.1955 |
5.22 | 8.0 | 1728 | 5.1785 |
5.22 | 9.0 | 1944 | 5.1327 |
5.1248 | 10.0 | 2160 | 5.1243 |
5.1248 | 11.0 | 2376 | 5.0889 |
5.0591 | 12.0 | 2592 | 5.0732 |
5.0591 | 13.0 | 2808 | 5.0417 |
5.0094 | 14.0 | 3024 | 5.0388 |
5.0094 | 15.0 | 3240 | 4.9299 |
5.0094 | 16.0 | 3456 | 4.2991 |
4.7527 | 17.0 | 3672 | 3.6541 |
4.7527 | 18.0 | 3888 | 2.7826 |
3.4431 | 19.0 | 4104 | 2.2796 |
3.4431 | 20.0 | 4320 | 2.0213 |
2.2803 | 21.0 | 4536 | 1.8809 |
2.2803 | 22.0 | 4752 | 1.7615 |
2.2803 | 23.0 | 4968 | 1.6925 |
1.8601 | 24.0 | 5184 | 1.6205 |
1.8601 | 25.0 | 5400 | 1.5751 |
1.6697 | 26.0 | 5616 | 1.5391 |
1.6697 | 27.0 | 5832 | 1.5200 |
1.5655 | 28.0 | 6048 | 1.4866 |
1.5655 | 29.0 | 6264 | 1.4656 |
1.5655 | 30.0 | 6480 | 1.4627 |
How to Use
As Masked Language Model
from transformers import pipeline
pretrained_name = "w11wo/lao-roberta-base"
prompt = "REPLACE WITH MASKED PROMPT"
fill_mask = pipeline(
"fill-mask",
model=pretrained_name,
tokenizer=pretrained_name
)
fill_mask(prompt)
Feature Extraction in PyTorch
from transformers import RobertaModel, RobertaTokenizerFast
pretrained_name = "w11wo/lao-roberta-base"
model = RobertaModel.from_pretrained(pretrained_name)
tokenizer = RobertaTokenizerFast.from_pretrained(pretrained_name)
prompt = "ສະບາຍດີຊາວໂລກ."
encoded_input = tokenizer(prompt, return_tensors='pt')
output = model(**encoded_input)
Disclaimer
Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.
Author
Lao RoBERTa Base was trained and evaluated by Wilson Wongso. All computation and development are done on Google's TPU-RC.
Framework versions
- Transformers 4.13.0.dev0
- Pytorch 1.9.0+cu102
- Datasets 1.16.1
- Tokenizers 0.10.3
- Downloads last month
- 8