matchaoneshot
/

RoBERTa-MalayMLMFineTuned

+---
+language: ml
+tags:
+  - roberta
+  - fine-tuned
+  - transformers
+  - bert
+  - masked-language-model
+license: apache-2.0
+model_type: roberta
+---
+# Fine-tuned RoBERTa on Malay Language
+This model is a fine-tuned version of the `mesolitica/roberta-base-bahasa-cased` model, specifically trained on a custom Malay dataset. The model is fine-tuned for **Masked Language Modeling (MLM)** on normalized Malay sentences.
+## Model Description
+This model is based on the **RoBERTa** architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.
+### Training Details
+- **Pre-trained Model**: `mesolitica/roberta-base-bahasa-cased`
+- **Task**: Masked Language Modeling (MLM)
+- **Training Dataset**: Custom dataset of Malay sentences
+- **Training Duration**: 3 epochs
+- **Batch Size**: 16 per device
+- **Learning Rate**: 1e-6
+- **Optimizer**: AdamW
+- **Evaluation**: Evaluated every 200 steps
+## Training and Validation Loss
+The following table shows the training and validation loss at each evaluation step during the fine-tuning process:
+| Step  | Training Loss | Validation Loss |
+|-------|---------------|-----------------|
+| 200   | 0.069000      | 0.069317        |
+| 400   | 0.070900      | 0.068213        |
+| 600   | 0.071900      | 0.067799        |
+| 800   | 0.070100      | 0.067430        |
+| 1000  | 0.068300      | 0.066448        |
+| 1200  | 0.069700      | 0.066594        |
+| 1400  | 0.069000      | 0.066185        |
+| 1600  | 0.067100      | 0.066022        |
+| 1800  | 0.063800      | 0.065695        |
+| 2000  | 0.037900      | 0.066657        |
+| 2200  | 0.041200      | 0.066739        |
+| 2400  | 0.042000      | 0.066777        |
+| 2600  | 0.040200      | 0.066858        |
+| 2800  | 0.044700      | 0.066712        |
+| 3000  | 0.041000      | 0.066415        |
+| 3200  | 0.041800      | 0.066634        |
+| 3400  | 0.041200      | 0.066341        |
+| 3600  | 0.039200      | 0.066837        |
+| 3800  | 0.023700      | 0.067717        |
+| 4000  | 0.024100      | 0.068017        |
+| 4200  | 0.024600      | 0.068155        |
+| 4400  | 0.024500      | 0.068275        |
+| 4600  | 0.024500      | 0.068106        |
+| 4800  | 0.026100      | 0.067965        |
+| 5000  | 0.024500      | 0.068108        |
+| 5200  | 0.025100      | 0.068027        |
+### Observations:
+- The training loss consistently decreased over time, with notable reductions in the earlier steps.
+- The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
+- The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.
+## Intended Use
+This model is intended for tasks such as:
+- **Masked Language Modeling (MLM)**: Fill in the blanks for masked tokens in a Malay sentence.
+- **Text Generation**: Generate plausible text given a context.
+- **Text Understanding**: Extract contextual meaning from Malay sentences.