Edit model card

distilgpt2-nepali-patrakar-qa

This model is a fine-tuned version of Sakonii/distilgpt2-nepali on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.9077

Model description

Refer to original distilgpt2

Intended uses & limitations

This marginally fine-tuned model can be used for Nepali text generation and possibly question answering and intends to be fine-tuned on Nepali language focused generative downstream task. The language model being trained on a data with texts grouped to a block size of 512, it handles text sequence up to 512 tokens.

Training procedure

The model is trained with the same configuration as the original distilgpt2; but with 512 tokens per instance, 72 instances per batch, and around 34.14K training steps (excluding the pre-training with CLM Objective).

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 72
  • eval_batch_size: 72
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
4.1278 1.0 6829 4.0184
3.9461 2.0 13658 3.9630
3.8268 3.0 20487 3.9319
3.7978 4.0 27316 3.9140
3.7949 5.0 34145 3.9077

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.0
  • Datasets 2.1.0
  • Tokenizers 0.13.3
Downloads last month
17
Safetensors
Model size
81.9M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from