Fine-tuned RoBERTa on Malay Language

This model is a fine-tuned version of the mesolitica/roberta-base-bahasa-cased model, specifically trained on a custom Malay dataset. The model is fine-tuned for Masked Language Modeling (MLM) on normalized Malay sentences.

Model Description

This model is based on the RoBERTa architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.

Training Details

  • Pre-trained Model: mesolitica/roberta-base-bahasa-cased
  • Task: Masked Language Modeling (MLM)
  • Training Dataset: Custom dataset of Malay sentences
  • Training Duration: 3 epochs
  • Batch Size: 16 per device
  • Learning Rate: 1e-6
  • Optimizer: AdamW
  • Evaluation: Evaluated every 200 steps

Training and Validation Loss

The following table shows the training and validation loss at each evaluation step during the fine-tuning process:

Step Training Loss Validation Loss
200 0.069000 0.069317
400 0.070900 0.068213
600 0.071900 0.067799
800 0.070100 0.067430
1000 0.068300 0.066448
1200 0.069700 0.066594
1400 0.069000 0.066185
1600 0.067100 0.066022
1800 0.063800 0.065695
2000 0.037900 0.066657
2200 0.041200 0.066739
2400 0.042000 0.066777
2600 0.040200 0.066858
2800 0.044700 0.066712
3000 0.041000 0.066415
3200 0.041800 0.066634
3400 0.041200 0.066341
3600 0.039200 0.066837
3800 0.023700 0.067717
4000 0.024100 0.068017
4200 0.024600 0.068155
4400 0.024500 0.068275
4600 0.024500 0.068106
4800 0.026100 0.067965
5000 0.024500 0.068108
5200 0.025100 0.068027

Observations:

  • The training loss consistently decreased over time, with notable reductions in the earlier steps.
  • The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
  • The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.

Intended Use

This model is intended for tasks such as:

  • Masked Language Modeling (MLM): Fill in the blanks for masked tokens in a Malay sentence.
  • Text Generation: Generate plausible text given a context.
  • Text Understanding: Extract contextual meaning from Malay sentences.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .