Technoculture
/

MedMerge-6-7b-alpha-dpo

4-bit precision

Model card Files Files and versions Community

MedMerge-6-7b-alpha-dpo / README.md

satyamt's picture

Update README.md

83b32dc verified 8 months ago

|

history blame contribute delete

No virus

2.5 kB

	---
	license: mit
	datasets:
	- argilla/distilabel-intel-orca-dpo-pairs
	- jondurbin/truthy-dpo-v0.1
	- argilla/distilabel-math-preference-dpo
	- argilla/distilabel-capybara-dpo-7k-binarized
	language:
	- en
	library_name: adapter-transformers
	base_model: Technoculture/MT7Bi-sft
	---

	# Technoculture/MedMerge-6-7b-alpha-dpo

	# Open LLM Leaderboard

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63486df1f8f01fcc4b23e97d/ZhdVcETriQf5WFiDhXb5q.png)

	\| Model Name \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\| ----------------------- \| -------- \| --------- \| ------ \| ---------- \| ---------- \| -------- \|
	\| Orca-2-7b \| 78.4 \| 76.1 \| 53.7 \| 52.4 \| 74.2 \| 47.2 \|
	\| LLAMA-2-7b \| 43.2 \| 77.1 \| 44.4 \| 38.7 \| 69.5 \| 16 \|
	\| MT7Bi-sft \| 54.1 \| 75.11 \| - \| 43.08 \| 72.14 \| 15.54 \|
	\| MedMerge-6-7b \| 29.52 \| 41.04 \| - \| 37.53 \| 59.35 \| 0.91 \|
	\| MedMerge-6-7b-alpha-dpo \| 54.27 \| 75.6 \| 52.65 \| 43.94 \| 71.03 \| 26.16 \|

	## Training Details

	- GPU: Nvidia A100 Tensor Core GPU
	- Total Batches: 4266
	- Epochs: 3
	- Duration: 3 hours, 57 minutes, and 00 seconds


	## DPO Training Dataset Mixture
	\| Dataset Name \| Original Size(Rows) \| Ratio \| Size After Ratio(Rows) \|
	\|----------------------------------------------------\|---------------\|-------\|------------------\|
	\| argilla/distilabel-math-preference-dpo \| 2.4k \| 1.0 \| 2.4k \|
	\| argilla/distilabel-intel-orca-dpo-pairs \| 12.9k \| 0.5 \| 6.45k \|
	\| jondurbin/truthy-dpo-v0.1 \| 1.04k \| 1.0 \| 1.04k \|
	\| argilla/distilabel-capybara-dpo-7k-binarized \| 7.5k \| 0.2 \| 1.5k \|
	Total Size: 11.38k

	## Training Loss Plot
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/wEkGQGRVK000d0q6FkXE9.png)

	## Training Loss Smoothed Plot
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/658bed1c8ff537204fbd92a3/CDk_JCsteIwGAG_DyHRDE.png)

	### For full details of this dpo-training please read our notebook.
	<a target="_blank" href="https://colab.research.google.com/github/dkshjn/Technoculture/blob/main/MedMerge_6_7b_alpha_dpo.ipynb">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
	</a>