---
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: MedQA_L3_1000steps_1e5rate_03beta_CSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# MedQA_L3_1000steps_1e5rate_03beta_CSFTDPO

This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.1933
- Rewards/chosen: -11.4580
- Rewards/rejected: -10.5069
- Rewards/accuracies: 0.3978
- Rewards/margins: -0.9511
- Logps/rejected: -56.3395
- Logps/chosen: -56.4159
- Logits/rejected: -1.1515
- Logits/chosen: -1.1516

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.7798        | 0.0489 | 50   | 1.1990          | -5.8479        | -5.9729          | 0.4879             | 0.1250          | -41.2261       | -37.7155     | -1.0251         | -1.0237       |
| 2.5761        | 0.0977 | 100  | 2.3542          | -8.6823        | -8.4134          | 0.4418             | -0.2689         | -49.3611       | -47.1635     | -0.2407         | -0.2400       |
| 2.5032        | 0.1466 | 150  | 2.1775          | -10.5620       | -9.9671          | 0.3978             | -0.5949         | -54.5403       | -53.4294     | -0.3965         | -0.3967       |
| 2.6542        | 0.1954 | 200  | 2.5561          | -12.1740       | -11.2384         | 0.3868             | -0.9357         | -58.7777       | -58.8028     | 0.1308          | 0.1310        |
| 1.3951        | 0.2443 | 250  | 2.5490          | -10.7075       | -10.0081         | 0.4286             | -0.6994         | -54.6768       | -53.9144     | -0.5745         | -0.5741       |
| 3.5175        | 0.2931 | 300  | 2.3833          | -10.8814       | -9.9123          | 0.3956             | -0.9691         | -54.3575       | -54.4939     | -0.9764         | -0.9764       |
| 2.172         | 0.3420 | 350  | 2.4460          | -11.5789       | -10.6473         | 0.3912             | -0.9315         | -56.8077       | -56.8190     | -0.7005         | -0.7002       |
| 3.2322        | 0.3908 | 400  | 2.3510          | -11.6671       | -10.7478         | 0.3956             | -0.9193         | -57.1426       | -57.1129     | -0.8878         | -0.8878       |
| 3.1419        | 0.4397 | 450  | 2.3341          | -11.9202       | -10.9493         | 0.4000             | -0.9710         | -57.8140       | -57.9567     | -0.9326         | -0.9326       |
| 3.046         | 0.4885 | 500  | 2.3867          | -12.1880       | -11.3561         | 0.3956             | -0.8319         | -59.1703       | -58.8493     | -1.0975         | -1.0976       |
| 2.4725        | 0.5374 | 550  | 2.2762          | -10.5014       | -9.6493          | 0.4198             | -0.8521         | -53.4809       | -53.2273     | -0.6739         | -0.6739       |
| 2.4975        | 0.5862 | 600  | 2.3654          | -11.0821       | -10.1978         | 0.4110             | -0.8843         | -55.3090       | -55.1628     | -0.9553         | -0.9556       |
| 2.5643        | 0.6351 | 650  | 2.3346          | -12.2241       | -11.1956         | 0.4000             | -1.0286         | -58.6350       | -58.9696     | -1.5180         | -1.5183       |
| 2.2992        | 0.6839 | 700  | 2.3866          | -11.3146       | -10.2942         | 0.3978             | -1.0204         | -55.6305       | -55.9379     | -1.0582         | -1.0586       |
| 2.2314        | 0.7328 | 750  | 2.2719          | -11.6693       | -10.6871         | 0.3868             | -0.9821         | -56.9403       | -57.1202     | -1.1724         | -1.1726       |
| 1.9824        | 0.7816 | 800  | 2.1847          | -11.7244       | -10.7928         | 0.3978             | -0.9317         | -57.2924       | -57.3041     | -1.1387         | -1.1388       |
| 2.2483        | 0.8305 | 850  | 2.2059          | -11.3930       | -10.4357         | 0.3978             | -0.9573         | -56.1021       | -56.1993     | -1.1437         | -1.1438       |
| 1.7727        | 0.8793 | 900  | 2.1957          | -11.4537       | -10.5021         | 0.4000             | -0.9516         | -56.3235       | -56.4016     | -1.1541         | -1.1542       |
| 1.9505        | 0.9282 | 950  | 2.1945          | -11.4590       | -10.5073         | 0.4000             | -0.9516         | -56.3409       | -56.4192     | -1.1517         | -1.1518       |
| 1.5188        | 0.9770 | 1000 | 2.1933          | -11.4580       | -10.5069         | 0.3978             | -0.9511         | -56.3395       | -56.4159     | -1.1515         | -1.1516       |


### Framework versions

- Transformers 4.41.0
- Pytorch 2.0.0+cu117
- Datasets 2.19.1
- Tokenizers 0.19.1