Edit model card

NepaliGPT: Nepali Language Generative Pretrained Transformer Model

This is an experiment for developing a language generation model for the Nepali language. Causal Language Model which can predict the next possible tokens given a context in Nepali language.

Dataset Used

A large corpus of 9.3 GB size has been collected from different sources on the internet. The sources include

  • Nepali Books found online.
  • Nepali News Article from Nepali news portals.
  • Nepali text collected from different open source Nepali NLP datasets.

Hyperparameters Used

Learning rate -> 2e-5
Weight Decay -> 0.01
Number of training epochs -> 5 \ bf16 -> True
Base Model Architecture -> GPT-2 \

Training Results

It achieves the following results on the evaluation set:

Training Loss Validation Loss Perplexity
3.3968 3.2705 26.3245
Downloads last month
14
Safetensors
Model size
88.2M params
Tensor type
F32
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.