---
license: apache-2.0
base_model: nnheui/pythia-1.4b-sft-full
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: pythia-1.4b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# pythia-1.4b-dpo-full

This model is a fine-tuned version of [nnheui/pythia-1.4b-sft-full](https://huggingface.co/nnheui/pythia-1.4b-sft-full) on an unknown dataset.
It achieves the following results on the evaluation set:
- Logits/chosen: -1.1953
- Logits/rejected: -1.2422
- Logps/bottom Tokens: -0.0007
- Logps/chosen: -446.0
- Logps/rejected: -416.0
- Loss: 0.6259
- Rewards/accuracies: 0.6567
- Rewards/chosen: -0.5234
- Rewards/margins: 0.2617
- Rewards/rejected: -0.7852

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 5
- eval_batch_size: 5
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 4
- total_train_batch_size: 120
- total_eval_batch_size: 30
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Logits/chosen | Logits/rejected | Logps/bottom Tokens | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:------:|:----:|:-------------:|:---------------:|:-------------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 0.678         | 0.1963 | 100  | -1.0938       | -1.1562         | -0.0009             | -396.0       | -344.0         | 0.6789          | 0.5881             | -0.0275        | 0.0332          | -0.0608          |
| 0.645         | 0.3925 | 200  | -1.1562       | -1.2031         | -0.0009             | -422.0       | -380.0         | 0.6489          | 0.6448             | -0.2871        | 0.1367          | -0.4238          |
| 0.6396        | 0.5888 | 300  | -1.1875       | -1.2344         | -0.0008             | -438.0       | -406.0         | 0.6304          | 0.6627             | -0.4512        | 0.2275          | -0.6797          |
| 0.6102        | 0.7851 | 400  | -1.1875       | -1.2344         | -0.0007             | -444.0       | -414.0         | 0.6268          | 0.6567             | -0.5039        | 0.2578          | -0.7617          |
| 0.6084        | 0.9814 | 500  | -1.1953       | -1.2422         | -0.0007             | -446.0       | -416.0         | 0.6259          | 0.6567             | -0.5234        | 0.2617          | -0.7852          |


### Framework versions

- Transformers 4.40.0
- Pytorch 2.2.2+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1