meAtIonio commited on
Commit
95ef72e
1 Parent(s): 2351480

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -3
README.md CHANGED
@@ -1,3 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PL-BERT Fine-Tuned on Hindi Wikipedia Dataset
2
+
3
+ This model is a fine-tuned version of **PL-BERT**, specifically trained on the Hindi subset of the Wiki40b dataset. The model has been optimized to understand and generate high-quality Hindi text, making it suitable for various NLP tasks in the Hindi language.
4
+ For more information about this model, check out the [GitHub](https://github.com/Ionio-io/PL-BERT-Fine-Tuned-hi-) repository.
5
+
6
+ ## Model Overview
7
+
8
+ - **Model Name:** PL-BERT (Fine-tuned on Hindi)
9
+ - **Base Model:** PL-BERT (Multilingual BERT variant)
10
+ - **Dataset:** Hindi subset from Wiki40b (51,000 cleaned Wikipedia articles)
11
+ - **Precision:** Mixed precision (FP16)
12
+
13
+ The fine-tuning process focused on improving the model's ability to handle Hindi text more effectively by leveraging a large, cleaned corpus of Wikipedia articles in Hindi.
14
+
15
+ ## Training Details
16
+
17
+ - **Model:** PL-BERT
18
+ - **Dataset:** Hindi subset from Wiki40b
19
+ - **Batch Size:** 64
20
+ - **Mixed Precision:** FP16
21
+ - **Optimizer:** AdamW
22
+ - **Training Steps:** 15,000
23
+
24
+ ### Training Progress
25
+
26
+ - **Final Loss:** 1.879
27
+ - **Vocabulary Loss:** 0.49
28
+ - **Token Loss:** 1.465
29
+
30
+ ### Validation Results
31
+
32
+ During training, we monitored performance with validation metrics:
33
+
34
+ - **Validation Loss:** 1.879
35
+ - **Vocabulary Accuracy:** 78.54%
36
+ - **Token Accuracy:** 82.30%
37
+
38
+
39
+ ---
40
+ license: apache-2.0
41
+ ---