matchaoneshot
commited on
Commit
•
4ac08c0
1
Parent(s):
759be12
update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,79 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: ml
|
3 |
+
tags:
|
4 |
+
- roberta
|
5 |
+
- fine-tuned
|
6 |
+
- transformers
|
7 |
+
- bert
|
8 |
+
- masked-language-model
|
9 |
+
|
10 |
+
license: apache-2.0
|
11 |
+
model_type: roberta
|
12 |
+
---
|
13 |
+
|
14 |
+
# Fine-tuned RoBERTa on Malay Language
|
15 |
+
|
16 |
+
This model is a fine-tuned version of the `mesolitica/roberta-base-bahasa-cased` model, specifically trained on a custom Malay dataset. The model is fine-tuned for **Masked Language Modeling (MLM)** on normalized Malay sentences.
|
17 |
+
|
18 |
+
## Model Description
|
19 |
+
|
20 |
+
This model is based on the **RoBERTa** architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.
|
21 |
+
|
22 |
+
### Training Details
|
23 |
+
|
24 |
+
- **Pre-trained Model**: `mesolitica/roberta-base-bahasa-cased`
|
25 |
+
- **Task**: Masked Language Modeling (MLM)
|
26 |
+
- **Training Dataset**: Custom dataset of Malay sentences
|
27 |
+
- **Training Duration**: 3 epochs
|
28 |
+
- **Batch Size**: 16 per device
|
29 |
+
- **Learning Rate**: 1e-6
|
30 |
+
- **Optimizer**: AdamW
|
31 |
+
- **Evaluation**: Evaluated every 200 steps
|
32 |
+
|
33 |
+
## Training and Validation Loss
|
34 |
+
|
35 |
+
The following table shows the training and validation loss at each evaluation step during the fine-tuning process:
|
36 |
+
|
37 |
+
| Step | Training Loss | Validation Loss |
|
38 |
+
|-------|---------------|-----------------|
|
39 |
+
| 200 | 0.069000 | 0.069317 |
|
40 |
+
| 400 | 0.070900 | 0.068213 |
|
41 |
+
| 600 | 0.071900 | 0.067799 |
|
42 |
+
| 800 | 0.070100 | 0.067430 |
|
43 |
+
| 1000 | 0.068300 | 0.066448 |
|
44 |
+
| 1200 | 0.069700 | 0.066594 |
|
45 |
+
| 1400 | 0.069000 | 0.066185 |
|
46 |
+
| 1600 | 0.067100 | 0.066022 |
|
47 |
+
| 1800 | 0.063800 | 0.065695 |
|
48 |
+
| 2000 | 0.037900 | 0.066657 |
|
49 |
+
| 2200 | 0.041200 | 0.066739 |
|
50 |
+
| 2400 | 0.042000 | 0.066777 |
|
51 |
+
| 2600 | 0.040200 | 0.066858 |
|
52 |
+
| 2800 | 0.044700 | 0.066712 |
|
53 |
+
| 3000 | 0.041000 | 0.066415 |
|
54 |
+
| 3200 | 0.041800 | 0.066634 |
|
55 |
+
| 3400 | 0.041200 | 0.066341 |
|
56 |
+
| 3600 | 0.039200 | 0.066837 |
|
57 |
+
| 3800 | 0.023700 | 0.067717 |
|
58 |
+
| 4000 | 0.024100 | 0.068017 |
|
59 |
+
| 4200 | 0.024600 | 0.068155 |
|
60 |
+
| 4400 | 0.024500 | 0.068275 |
|
61 |
+
| 4600 | 0.024500 | 0.068106 |
|
62 |
+
| 4800 | 0.026100 | 0.067965 |
|
63 |
+
| 5000 | 0.024500 | 0.068108 |
|
64 |
+
| 5200 | 0.025100 | 0.068027 |
|
65 |
+
|
66 |
+
|
67 |
+
### Observations:
|
68 |
+
- The training loss consistently decreased over time, with notable reductions in the earlier steps.
|
69 |
+
- The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
|
70 |
+
- The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.
|
71 |
+
|
72 |
+
|
73 |
+
|
74 |
+
## Intended Use
|
75 |
+
|
76 |
+
This model is intended for tasks such as:
|
77 |
+
- **Masked Language Modeling (MLM)**: Fill in the blanks for masked tokens in a Malay sentence.
|
78 |
+
- **Text Generation**: Generate plausible text given a context.
|
79 |
+
- **Text Understanding**: Extract contextual meaning from Malay sentences.
|