Edit model card

rapper-gpt

This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.2-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1394

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • num_epochs: 25
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.0259 0.8889 4 0.9759
0.7299 2.0 9 0.9117
0.8055 2.8889 13 0.8819
0.5669 4.0 18 0.8468
0.622 4.8889 22 0.8584
0.4568 6.0 27 0.9003
0.5379 6.8889 31 0.9569
0.4119 8.0 36 0.9821
0.4993 8.8889 40 1.0176
0.3941 10.0 45 1.0345
0.4832 10.8889 49 1.0687
0.3836 12.0 54 1.0911
0.4758 12.8889 58 1.0688
0.3788 14.0 63 1.0902
0.4711 14.8889 67 1.0868
0.3749 16.0 72 1.0949
0.4663 16.8889 76 1.1072
0.3724 18.0 81 1.1164
0.464 18.8889 85 1.1282
0.3702 20.0 90 1.1350
0.4619 20.8889 94 1.1387
0.3684 22.0 99 1.1391
0.4108 22.2222 100 1.1394

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for