matchaoneshot commited on
Commit
4ac08c0
1 Parent(s): 759be12

update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ml
3
+ tags:
4
+ - roberta
5
+ - fine-tuned
6
+ - transformers
7
+ - bert
8
+ - masked-language-model
9
+
10
+ license: apache-2.0
11
+ model_type: roberta
12
+ ---
13
+
14
+ # Fine-tuned RoBERTa on Malay Language
15
+
16
+ This model is a fine-tuned version of the `mesolitica/roberta-base-bahasa-cased` model, specifically trained on a custom Malay dataset. The model is fine-tuned for **Masked Language Modeling (MLM)** on normalized Malay sentences.
17
+
18
+ ## Model Description
19
+
20
+ This model is based on the **RoBERTa** architecture, a robustly optimized version of BERT. It was pre-trained on a large corpus of text in the Malay language and then fine-tuned on a specialized dataset consisting of normalized Malay sentences. The fine-tuning task involved predicting masked tokens in sentences, which is typical for masked language modeling tasks.
21
+
22
+ ### Training Details
23
+
24
+ - **Pre-trained Model**: `mesolitica/roberta-base-bahasa-cased`
25
+ - **Task**: Masked Language Modeling (MLM)
26
+ - **Training Dataset**: Custom dataset of Malay sentences
27
+ - **Training Duration**: 3 epochs
28
+ - **Batch Size**: 16 per device
29
+ - **Learning Rate**: 1e-6
30
+ - **Optimizer**: AdamW
31
+ - **Evaluation**: Evaluated every 200 steps
32
+
33
+ ## Training and Validation Loss
34
+
35
+ The following table shows the training and validation loss at each evaluation step during the fine-tuning process:
36
+
37
+ | Step | Training Loss | Validation Loss |
38
+ |-------|---------------|-----------------|
39
+ | 200 | 0.069000 | 0.069317 |
40
+ | 400 | 0.070900 | 0.068213 |
41
+ | 600 | 0.071900 | 0.067799 |
42
+ | 800 | 0.070100 | 0.067430 |
43
+ | 1000 | 0.068300 | 0.066448 |
44
+ | 1200 | 0.069700 | 0.066594 |
45
+ | 1400 | 0.069000 | 0.066185 |
46
+ | 1600 | 0.067100 | 0.066022 |
47
+ | 1800 | 0.063800 | 0.065695 |
48
+ | 2000 | 0.037900 | 0.066657 |
49
+ | 2200 | 0.041200 | 0.066739 |
50
+ | 2400 | 0.042000 | 0.066777 |
51
+ | 2600 | 0.040200 | 0.066858 |
52
+ | 2800 | 0.044700 | 0.066712 |
53
+ | 3000 | 0.041000 | 0.066415 |
54
+ | 3200 | 0.041800 | 0.066634 |
55
+ | 3400 | 0.041200 | 0.066341 |
56
+ | 3600 | 0.039200 | 0.066837 |
57
+ | 3800 | 0.023700 | 0.067717 |
58
+ | 4000 | 0.024100 | 0.068017 |
59
+ | 4200 | 0.024600 | 0.068155 |
60
+ | 4400 | 0.024500 | 0.068275 |
61
+ | 4600 | 0.024500 | 0.068106 |
62
+ | 4800 | 0.026100 | 0.067965 |
63
+ | 5000 | 0.024500 | 0.068108 |
64
+ | 5200 | 0.025100 | 0.068027 |
65
+
66
+
67
+ ### Observations:
68
+ - The training loss consistently decreased over time, with notable reductions in the earlier steps.
69
+ - The validation loss showed slight fluctuations, but overall, it remained relatively stable after the first few thousand steps.
70
+ - The model demonstrated good convergence as training progressed, with a sharp drop in the training loss after the first few steps.
71
+
72
+
73
+
74
+ ## Intended Use
75
+
76
+ This model is intended for tasks such as:
77
+ - **Masked Language Modeling (MLM)**: Fill in the blanks for masked tokens in a Malay sentence.
78
+ - **Text Generation**: Generate plausible text given a context.
79
+ - **Text Understanding**: Extract contextual meaning from Malay sentences.