thesis_nepaliGPT / README.md
Shushant's picture
maintain README.md
9249dd6
metadata
license: bsd-3-clause-clear
language:
  - ne
metrics:
  - perplexity
library_name: transformers
pipeline_tag: text-generation

NepaliGPT: Nepali Language Generative Pretrained Transformer Model

This is an experiment for developing a language generation model for the Nepali language. Causal Language Model which can predict the next possible tokens given a context in Nepali language.

Dataset Used

A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include

  • Nepali Books found online.
  • Nepali News Article from Nepali news portals.
  • Nepali text collected from different open source Nepali NLP datasets.

Hyperparameters Used

Learning rate -> 2e-5
Weight Decay -> 0.01
Number of training epochs -> 5 \ bf16 -> True
Base Model Architecture -> GPT-2 \

Training Results

It achieves the following results on the evaluation set:

Training Loss Validation Loss Perplexity
3.3968 3.2705 26.3245