Angelectronic's picture
Update README.md
db2b235 verified
|
raw
history blame
1.76 kB
metadata
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mistral
  - trl
base_model: unsloth/mistral-7b-instruct-v0.2-bnb-4bit

Uploaded model

  • Developed by: Angelectronic
  • License: apache-2.0
  • Finetuned from model : unsloth/mistral-7b-instruct-v0.2-bnb-4bit

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Evaluation

  • ViMMRC test set: 0.8475 accuracy

Training results

Training Loss Accuracy Step Validation Loss
1.033500 0.771325 240 1.478651
0.852000 0.758621 480 1.475045
0.751200 0.751361 720 1.501176
0.668400 0.780399 960 1.543064
0.591600 0.796733 1200 1.567212
0.498200 0.785844 1440 1.607110
0.379600 0.796733 1680 1.643269
0.334200 0.771324 1920 1.661141

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 3407
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 5
  • num_epochs: 3

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.3.0
  • Datasets 2.19.1
  • Tokenizers 0.19.1