Edit model card

Fine-tuning

  • this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
  • the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
  • fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
  • trained on bf16 format
  • Label 0 stands for rejected sentence
  • Label 1 stands for chosen sentence

Metric

  • train and validation split
train loss eval loss accuracy recall precision f1-score
0.114 0.1615 0.9399 0.9459 0.9346 0.9402
  • test split
accuracy recall precision f1-score
0.9416 0.9319 0.9504 0.9411
  • confusion matrix when test split

image/png

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.4109 1.0 1479 0.2462 0.9003 0.8710 0.9399 0.9041
0.1579 2.0 2958 0.1573 0.9399 0.9495 0.9293 0.9393
0.114 3.0 4437 0.1615 0.9399 0.9346 0.9460 0.9403

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.0+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
100
Safetensors
Model size
561M params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Collection including ryota39/mluke-large-lite-reward