metadata

license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - trl
  - reward-trainer
  - generated_from_trainer
datasets:
  - hdfs_rlhf_log_summary_dataset
metrics:
  - accuracy
model-index:
  - name: log_sage_reward_model
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: hdfs_rlhf_log_summary_dataset
          type: hdfs_rlhf_log_summary_dataset
          config: default
          split: None
          args: default
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9

log_sage_reward_model

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the hdfs_rlhf_log_summary_dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.5838
Accuracy: 0.9

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.41e-05
train_batch_size: 4
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	1	0.6933	0.6
No log	2.0	3	0.6923	0.9
No log	3.0	4	0.6920	0.9
No log	4.0	6	0.6912	0.9
No log	5.0	8	0.6902	0.9
No log	6.0	9	0.6898	0.9
0.2745	7.0	11	0.6885	0.9
0.2745	8.0	12	0.6876	0.9
0.2745	9.0	13	0.6862	0.9
0.2745	10.0	15	0.6830	0.9
0.2745	11.0	16	0.6813	0.9
0.2745	12.0	18	0.6768	0.8
0.2705	13.0	20	0.6707	0.8
0.2705	14.0	21	0.6665	0.8
0.2705	15.0	23	0.6576	0.8
0.2705	16.0	24	0.6521	0.8
0.2705	17.0	25	0.6457	0.8
0.2705	18.0	27	0.6334	0.9
0.2705	19.0	28	0.6273	0.9
0.2555	20.0	30	0.6165	0.9
0.2555	21.0	32	0.6063	0.9
0.2555	22.0	33	0.6017	0.9
0.2555	23.0	35	0.5942	0.9
0.2555	24.0	36	0.5911	0.9
0.2555	25.0	37	0.5882	0.9
0.2555	26.0	39	0.5846	0.9
0.2245	27.0	40	0.5838	0.9

Framework versions

Transformers 4.39.0
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2