metadata

license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - trl
  - reward-trainer
  - generated_from_trainer
datasets:
  - hdfs_rlhf_log_summary_dataset
metrics:
  - accuracy
model-index:
  - name: log_sage_reward_model
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: hdfs_rlhf_log_summary_dataset
          type: hdfs_rlhf_log_summary_dataset
          config: default
          split: None
          args: default
        metrics:
          - name: Accuracy
            type: accuracy
            value: 1

log_sage_reward_model

This model is a fine-tuned version of distilbert/distilbert-base-uncased on the hdfs_rlhf_log_summary_dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.1669
Accuracy: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.41e-05
train_batch_size: 6
eval_batch_size: 24
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 96
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	1	0.6950	0.5
No log	2.0	2	0.6896	1.0
No log	3.0	3	0.6843	1.0
No log	4.0	4	0.6789	1.0
No log	5.0	5	0.6735	1.0
No log	6.0	6	0.6671	1.0
No log	7.0	7	0.6597	1.0
No log	8.0	8	0.6510	1.0
No log	9.0	9	0.6403	1.0
0.0839	10.0	10	0.6275	1.0
0.0839	11.0	11	0.6130	1.0
0.0839	12.0	12	0.5955	1.0
0.0839	13.0	13	0.5747	1.0
0.0839	14.0	14	0.5508	1.0
0.0839	15.0	15	0.5250	1.0
0.0839	16.0	16	0.4984	1.0
0.0839	17.0	17	0.4698	1.0
0.0839	18.0	18	0.4413	1.0
0.0839	19.0	19	0.4121	1.0
0.0658	20.0	20	0.3850	1.0
0.0658	21.0	21	0.3604	1.0
0.0658	22.0	22	0.3384	1.0
0.0658	23.0	23	0.3186	1.0
0.0658	24.0	24	0.2995	1.0
0.0658	25.0	25	0.2823	1.0
0.0658	26.0	26	0.2664	1.0
0.0658	27.0	27	0.2516	1.0
0.0658	28.0	28	0.2384	1.0
0.0658	29.0	29	0.2260	1.0
0.0346	30.0	30	0.2149	1.0
0.0346	31.0	31	0.2054	1.0
0.0346	32.0	32	0.1971	1.0
0.0346	33.0	33	0.1898	1.0
0.0346	34.0	34	0.1838	1.0
0.0346	35.0	35	0.1787	1.0
0.0346	36.0	36	0.1746	1.0
0.0346	37.0	37	0.1714	1.0
0.0346	38.0	38	0.1691	1.0
0.0346	39.0	39	0.1676	1.0
0.021	40.0	40	0.1669	1.0

Framework versions

Transformers 4.39.0
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2