|
--- |
|
license: mit |
|
datasets: |
|
- argilla/distilabel-intel-orca-dpo-pairs |
|
- jondurbin/truthy-dpo-v0.1 |
|
- argilla/distilabel-math-preference-dpo |
|
- argilla/distilabel-capybara-dpo-7k-binarized |
|
language: |
|
- en |
|
library_name: adapter-transformers |
|
base_model: Technoculture/MT7Bi-sft |
|
--- |
|
|
|
# Technoculture/MedMerge-6-7b-alpha-dpo |
|
|
|
# Open LLM Leaderboard |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63486df1f8f01fcc4b23e97d/ZhdVcETriQf5WFiDhXb5q.png) |
|
|
|
| Model Name | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K | |
|
| ----------------------- | -------- | --------- | ------ | ---------- | ---------- | -------- | |
|
| Orca-2-7b | **78.4** | 76.1 | 53.7 | **52.4** | **74.2** | **47.2** | |
|
| LLAMA-2-7b | 43.2 | **77.1** | 44.4 | 38.7 | 69.5 | 16 | |
|
| MT7Bi-sft | 54.1 | 75.11 | - | 43.08 | 72.14 | 15.54 | |
|
| MedMerge-6-7b | 29.52 | 41.04 | - | 37.53 | 59.35 | 0.91 | |
|
| MedMerge-6-7b-alpha-dpo | 54.27 | 75.6 | 52.65 | 43.94 | 71.03 | 26.16 | |
|
|
|
## Training Details |
|
|
|
- **GPU:** Nvidia A100 Tensor Core GPU |
|
- **Total Batches:** 4266 |
|
- **Epochs:** 3 |
|
- **Duration:** 3 hours, 57 minutes, and 00 seconds |
|
|
|
|
|
## DPO Training Dataset Mixture |
|
| Dataset Name | Original Size(Rows) | Ratio | Size After Ratio(Rows) | |
|
|----------------------------------------------------|---------------|-------|------------------| |
|
| argilla/distilabel-math-preference-dpo | 2.4k | 1.0 | 2.4k | |
|
| argilla/distilabel-intel-orca-dpo-pairs | 12.9k | 0.5 | 6.45k | |
|
| jondurbin/truthy-dpo-v0.1 | 1.04k | 1.0 | 1.04k | |
|
| argilla/distilabel-capybara-dpo-7k-binarized | 7.5k | 0.2 | 1.5k | |
|
Total Size: 11.38k |
|
|
|
## Training Loss Plot |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/wEkGQGRVK000d0q6FkXE9.png) |
|
|
|
## Training Loss Smoothed Plot |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/CDk_JCsteIwGAG_DyHRDE.png) |
|
|
|
### For full details of this dpo-training please read our notebook. |
|
<a target="_blank" href="https://colab.research.google.com/github/dkshjn/Technoculture/blob/main/MedMerge_6_7b_alpha_dpo.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |