Shushant commited on
Commit
9249dd6
1 Parent(s): 92bd2e6

maintain README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -13
README.md CHANGED
@@ -6,26 +6,24 @@ metrics:
6
  - perplexity
7
  library_name: transformers
8
  pipeline_tag: text-generation
9
- datasets:
10
- - Sakonii/nepalitext-language-model-dataset
11
  ---
12
 
13
- # NepaliGPT:Nepali Language Generative Pretrained Transformer Model
14
  This is an experiment for developing a language generation model for the Nepali language.
15
  Causal Language Model which can predict the next possible tokens given a context in Nepali language.
16
 
17
  # Dataset Used
18
- A large corpus of 9.3 GB size has been collected from different sources from internet. The sources include
19
- - Nepali Books found online .
20
  - Nepali News Article from Nepali news portals.
21
- - Nepali text collected from different open souce Nepali NLP datasets.
22
 
23
  # Hyperparameters Used
24
- Learning rate -> 2e-5
25
- Weight Decay -> 0.01
26
- Number of training epochs -> 5
27
- bf16 -> True
28
- Base Model Architecture -> gpt-2
29
 
30
  ## Training Results
31
 
@@ -33,5 +31,4 @@ It achieves the following results on the evaluation set:
33
 
34
  | Training Loss | Validation Loss | Perplexity
35
  |:-------------:|:---------------:|:----------:|
36
- | 3.3968 | 3.2705 | 26.3245
37
-
 
6
  - perplexity
7
  library_name: transformers
8
  pipeline_tag: text-generation
 
 
9
  ---
10
 
11
+ # NepaliGPT: Nepali Language Generative Pretrained Transformer Model
12
  This is an experiment for developing a language generation model for the Nepali language.
13
  Causal Language Model which can predict the next possible tokens given a context in Nepali language.
14
 
15
  # Dataset Used
16
+ A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include
17
+ - Nepali Books found online.
18
  - Nepali News Article from Nepali news portals.
19
+ - Nepali text collected from different open source Nepali NLP datasets.
20
 
21
  # Hyperparameters Used
22
+ Learning rate -> 2e-5 \
23
+ Weight Decay -> 0.01 \
24
+ Number of training epochs -> 5 \
25
+ bf16 -> True \
26
+ Base Model Architecture -> GPT-2 \
27
 
28
  ## Training Results
29
 
 
31
 
32
  | Training Loss | Validation Loss | Perplexity
33
  |:-------------:|:---------------:|:----------:|
34
+ | 3.3968 | 3.2705 | 26.3245