README.md · ryota39/mluke-large-lite-reward at main

metadata

license: apache-2.0
base_model: studio-ousia/mluke-large-lite
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - precision
  - recall
  - f1
model-index:
  - name: out
    results: []

Fine-tuning

this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
trained on bf16 format
Label 0 stands for rejected sentence
Label 1 stands for chosen sentence
Note that this model can handle only 512 tokens in maximum
- The limitation arises from Luke-based pre-trained model

Metric

train and validation split

train loss	eval loss	accuracy	recall	precision	f1-score
0.114	0.1615	0.9399	0.9459	0.9346	0.9402

test split

accuracy	recall	precision	f1-score
0.9416	0.9319	0.9504	0.9411

confusion matrix when test split

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
0.4109	1.0	1479	0.2462	0.9003	0.8710	0.9399	0.9041
0.1579	2.0	2958	0.1573	0.9399	0.9495	0.9293	0.9393
0.114	3.0	4437	0.1615	0.9399	0.9346	0.9460	0.9403

Framework versions

Transformers 4.42.3
Pytorch 2.1.0+cu118
Datasets 2.20.0
Tokenizers 0.19.1