---
license: mit
datasets:
- argilla/distilabel-intel-orca-dpo-pairs
- jondurbin/truthy-dpo-v0.1
- argilla/distilabel-math-preference-dpo
- argilla/distilabel-capybara-dpo-7k-binarized
language:
- en
library_name: adapter-transformers
---

# Technoculture/MedMerge-6-7b-alpha-dpo

## Training Details

- **GPU:** Nvidia A100 Tensor Core GPU
- **Total Batches:** 4266
- **Epochs:** 3
- **Duration:** 3 hours, 57 minutes, and 00 seconds


## DPO Training Dataset Mixture
| Dataset Name                                       | Original Size(Rows) | Ratio | Size After Ratio(Rows) |
|----------------------------------------------------|---------------|-------|------------------|
| argilla/distilabel-math-preference-dpo            | 2.4k | 1.0   | 2.4k           | 
| argilla/distilabel-intel-orca-dpo-pairs           | 12.9k | 0.5   | 6.45k           | 
| jondurbin/truthy-dpo-v0.1                         | 1.04k | 1.0   | 1.04k           |
| argilla/distilabel-capybara-dpo-7k-binarized      | 7.5k | 0.2   | 1.5k           | 
Total Size: 11.38k

## Training Loss Plot
![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/wEkGQGRVK000d0q6FkXE9.png)

## Training Loss Smoothed Plot
![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/CDk_JCsteIwGAG_DyHRDE.png)

### For full details of this dpo-training please read our notebook [MedMerge-6-7b-alpha-dpo](https://colab.research.google.com/drive/1VaZnqRK2K0L4Vha_pURwJQ9EZzXyiNrO?usp=sharing).