Fine-tuned RoBERTa on Malay Language
This model is a fine-tuned version of the mesolitica/roberta-base-bahasa-cased
model, specifically trained on a custom Malay dataset. The model is fine-tuned for Masked Language Modeling (MLM) on normalized Malay sentences.
Model Description
This model is based on the RoBERTa architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.
Training Details
- Pre-trained Model:
mesolitica/roberta-base-bahasa-cased
- Task: Masked Language Modeling (MLM)
- Training Dataset: Custom dataset of Malay sentences
- Training Duration: 3 epochs
- Batch Size: 16 per device
- Learning Rate: 1e-6
- Optimizer: AdamW
- Evaluation: Evaluated every 200 steps
Training and Validation Loss
The following table shows the training and validation loss at each evaluation step during the fine-tuning process:
Step | Training Loss | Validation Loss |
---|---|---|
200 | 0.069000 | 0.069317 |
400 | 0.070900 | 0.068213 |
600 | 0.071900 | 0.067799 |
800 | 0.070100 | 0.067430 |
1000 | 0.068300 | 0.066448 |
1200 | 0.069700 | 0.066594 |
1400 | 0.069000 | 0.066185 |
1600 | 0.067100 | 0.066022 |
1800 | 0.063800 | 0.065695 |
2000 | 0.037900 | 0.066657 |
2200 | 0.041200 | 0.066739 |
2400 | 0.042000 | 0.066777 |
2600 | 0.040200 | 0.066858 |
2800 | 0.044700 | 0.066712 |
3000 | 0.041000 | 0.066415 |
3200 | 0.041800 | 0.066634 |
3400 | 0.041200 | 0.066341 |
3600 | 0.039200 | 0.066837 |
3800 | 0.023700 | 0.067717 |
4000 | 0.024100 | 0.068017 |
4200 | 0.024600 | 0.068155 |
4400 | 0.024500 | 0.068275 |
4600 | 0.024500 | 0.068106 |
4800 | 0.026100 | 0.067965 |
5000 | 0.024500 | 0.068108 |
5200 | 0.025100 | 0.068027 |
Observations:
- The training loss consistently decreased over time, with notable reductions in the earlier steps.
- The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
- The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.
Intended Use
This model is intended for tasks such as:
- Masked Language Modeling (MLM): Fill in the blanks for masked tokens in a Malay sentence.
- Text Generation: Generate plausible text given a context.
- Text Understanding: Extract contextual meaning from Malay sentences.