---
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: NbAiLab/nb-gpt-j-6B-v2
datasets:
- hugodk-sch/aftonposten_title_prefs
model-index:
- name: aftonposten-6b-align-scan
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# aftonposten-6b-align-scan

This model is a fine-tuned version of [data/ap-gpt-j-6b-sft-qlora-04-08](https://huggingface.co/data/ap-gpt-j-6b-sft-qlora-04-08) on the hugodk-sch/aftonposten_title_prefs dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5772
- Rewards/chosen: 0.0684
- Rewards/rejected: 0.0623
- Rewards/accuracies: 0.5307
- Rewards/margins: 0.0061
- Logps/rejected: -37.4276
- Logps/chosen: -33.9368
- Logits/rejected: -2.2420
- Logits/chosen: -2.2469

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 4

### Training results

| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 0.4711        | 0.26  | 100  | -2.2401       | -2.2352         | -34.0113     | -37.4979       | 0.5755          | 0.5195             | 0.0163         | 0.0032          | 0.0131           |
| 0.5061        | 0.52  | 200  | -2.2385       | -2.2337         | -34.0500     | -37.5455       | 0.5877          | 0.4992             | -0.0108        | 0.0094          | -0.0202          |
| 0.3371        | 0.78  | 300  | -2.2371       | -2.2322         | -34.0344     | -37.5353       | 0.5843          | 0.5278             | 0.0001         | 0.0132          | -0.0131          |
| 0.4001        | 1.04  | 400  | 0.6350        | -0.0073         | 0.0033       | 0.4838         | -0.0106         | -37.5120           | -34.0450       | -2.2353         | -2.2402          |
| 0.3401        | 1.3   | 500  | 0.6238        | -0.0135         | -0.0193      | 0.5141         | 0.0058          | -37.5443           | -34.0539       | -2.2353         | -2.2402          |
| 0.433         | 1.56  | 600  | 0.6143        | 0.0129          | 0.0108       | 0.5245         | 0.0021          | -37.5011           | -34.0161       | -2.2421         | -2.2469          |
| 0.3298        | 1.82  | 700  | 0.5790        | 0.0633          | 0.0499       | 0.5195         | 0.0134          | -37.4453           | -33.9442       | -2.2401         | -2.2450          |
| 0.14          | 2.08  | 800  | 0.5904        | 0.0586          | 0.0544       | 0.5162         | 0.0041          | -37.4389           | -33.9509       | -2.2423         | -2.2472          |
| 0.2302        | 2.34  | 900  | 0.5758        | 0.0851          | 0.0740       | 0.5544         | 0.0111          | -37.4109           | -33.9130       | -2.2448         | -2.2497          |
| 0.2296        | 2.6   | 1000 | 0.5750        | 0.0631          | 0.0552       | 0.5075         | 0.0080          | -37.4378           | -33.9444       | -2.2440         | -2.2489          |
| 0.2798        | 2.86  | 1100 | 0.5483        | 0.0729          | 0.0545       | 0.5428         | 0.0184          | -37.4387           | -33.9303       | -2.2419         | -2.2468          |
| 0.1195        | 3.12  | 1200 | 0.5759        | 0.0672          | 0.0613       | 0.5137         | 0.0059          | -37.4291           | -33.9386       | -2.2424         | -2.2473          |
| 0.1371        | 3.38  | 1300 | 0.5592        | 0.0733          | 0.0574       | 0.5494         | 0.0159          | -37.4346           | -33.9299       | -2.2434         | -2.2483          |
| 0.0993        | 3.64  | 1400 | 0.6130        | 0.0546          | 0.0598       | 0.4871         | -0.0053         | -37.4311           | -33.9566       | -2.2422         | -2.2471          |
| 0.18          | 3.9   | 1500 | 0.5566        | 0.0778          | 0.0602       | 0.5050         | 0.0176          | -37.4306           | -33.9234       | -2.2423         | -2.2472          |


### Framework versions

- PEFT 0.10.0
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.1