ryota39's picture
Update README.md
df335fc verified
metadata
license: apache-2.0
base_model: studio-ousia/mluke-large-lite
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: out
    results: []

Fine-tuning

  • this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
  • the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
  • fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
  • trained on bf16 format
  • Label 0 stands for rejected sentence
  • Label 1 stands for chosen sentence
  • Note that this model can handle only 512 tokens in maximum
    • The limitation arises from Luke-based pre-trained model

Metric

  • train and validation split
train loss eval loss accuracy recall precision f1-score
0.114 0.1615 0.9399 0.9459 0.9346 0.9402
  • test split
accuracy recall precision f1-score
0.9416 0.9319 0.9504 0.9411
  • confusion matrix when test split

image/png

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
0.4109 1.0 1479 0.2462 0.9003 0.8710 0.9399 0.9041
0.1579 2.0 2958 0.1573 0.9399 0.9495 0.9293 0.9393
0.114 3.0 4437 0.1615 0.9399 0.9346 0.9460 0.9403

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.1.0+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1