Edit model card

distilBERT-Nepali

This model fine-tuned model of raygx/distilBERT-Nepali, revision no.: b35360e0cffb71ae18aaf4ea00ff8369964243a2

It achieves the following results on the evaluation set:

Perplexity:

  • lowest: 17.31
  • average: 19.12z

(This is because training is done in batches of data due to limited resources available)

Loss:

  • loss: 3.2503
  • val_loss: 3.0674

Model description

This model is trained on raygx/Nepali-Extended-Text-Corpus dataset. This dataset is a mixture of cc100 and raygx/Nepali-Text-Corpus. Thus this model is trained on 10 times more data than its previous self. Another change is, the tokenizer is different. Hence, it is a totally different model.

Training procedure

Training is done by running one epoch at once on a batch of data. Thus, training is done for total 6 rounds. So, there were total of 3 batches and 2 epochs.

Training hyperparameters

The following hyperparameters were used during training:

  • optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-05, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-05, 'decay_steps': 16760, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
  • training_precision: mixed_float16

Training results

Perplexity:

  • lowest: 17.31
  • average: 19.12

Loss:

  • loss: 4.8605 - val_loss: 4.0510 - Perplexity: 56.96
  • loss: 3.8504 - val_loss: 3.5142 - Perplexity: 33.65
  • loss: 3.4918 - val_loss: 3.2408 - Perplexity: 25.64
  • loss: 3.2503 - val_loss: 3.0674 - Perplexity: 21.56
  • loss: 3.1324 - val_loss: 2.9243 - Perplexity: 18.49
  • loss: 3.2503 - val_loss: 3.0674 - Perplexity: 17.31

Framework versions

  • Transformers 4.30.2
  • TensorFlow 2.12.0
  • Datasets 2.1.0
  • Tokenizers 0.13.3
Downloads last month
4

Datasets used to train raygx/distilBERT-Nepali

Evaluation results