Technoculture
/

Medmerge-tulu-70b

Text Generation

epfl-llm/meditron-70b

allenai/tulu-2-dpo-70b

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Medmerge-tulu-70b / README.md

satyamt's picture

Update README.md

e009ad6 verified 5 months ago

|

history blame contribute delete

No virus

3.55 kB

	---
	license: apache-2.0
	tags:
	- merge
	- mergekit
	- epfl-llm/meditron-70b
	- allenai/tulu-2-dpo-70b
	---

	# Medmerge-tulu-70b

	Medmerge-tulu-70b is a merge of the following models:
	* [wanglab/ClinicalCamel-70B](https://huggingface.co/wanglab/ClinicalCamel-70B)
	* [epfl-llm/meditron-70b](https://huggingface.co/epfl-llm/meditron-70b)
	* [allenai/tulu-2-dpo-70b](https://huggingface.co/allenai/tulu-2-dpo-70b)

	# Open LLM Leaderboard

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63486df1f8f01fcc4b23e97d/ajm6Z9cCmd74ERdz4xdHs.png)

	\| Model Name \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\| -------------------- \| -------- \| --------- \| ------ \| ---------- \| ---------- \| -------- \|
	\| tulu-2-dpo-70b \| 72.1 \| 88.99 \| 69.84 \| 65.78 \| 83.27 \| 62.62 \|
	\| Medmerge-tulu-70b \| 67.81 \| 87.46 \| 70.1 \| 47.89 \| 83.43 \| 56.56 \|

	## Performance

	Clinical Camel demonstrates competitive performance on medical benchmarks.

	Table: Five-Shot Performance of Clinical Camel-70B (C70), GPT3.5, GPT4, and Med-PaLM 2 on Various Medical Datasets

	\| Dataset \| Medmerge-tulu-70b \| ClinicalCamel-70B \| GPT3.5 \| GPT4 \| Med-PaLM 2 \|
	\|-----------------------------\|-------------------\|-------------------\|--------\|-------\|--------------\|
	\| MMLU Anatomy \| 66.6 \| 65.2 \| 60.7 \| 80.0 \| 77.8 \|
	\| MMLU Clinical Knowledge \| 72.0 \| 72.8 \| 68.7 \| 86.4 \| 88.3 \|
	\| MMLU College Biology \| 84.7 \| 81.2 \| 72.9 \| 93.8 \| 94.4 \|
	\| MMLU College Medicine \| 64.2 \| 68.2 \| 63.6 \| 76.3 \| 80.9 \|
	\| MMLU Medical Genetics \| 76.0 \| 69.0 \| 68.0 \| 92.0 \| 90.0 \|
	\| MMLU Professional Medicine \| 75.7 \| 75.0 \| 69.8 \| 93.8 \| 95.2 \|
	\| MedMCQA \| \| 54.2 \| 51.0 \| 72.4 \| 71.3 \|
	\| MedQA (USMLE) \| \| 60.7 \| 53.6 \| 81.4 \| 79.7 \|
	\| PubMedQA \| \| 77.9 \| 60.2 \| 74.4 \| 79.2 \|
	\| USMLE Sample Exam \| \| 64.3 \| 58.5 \| 86.6 \| - \|

	## 🧩 Configuration

	```yaml
	models:
	- model: NousResearch/Llama-2-70b-hf
	# no parameters necessary for base model
	- model: wanglab/ClinicalCamel-70B
	parameters:
	weight: 0.08
	density: 0.45
	- model: epfl-llm/meditron-70b
	parameters:
	weight: 0.08
	density: 0.45
	- model: allenai/tulu-2-dpo-70b
	parameters:
	weight: 0.08
	density: 0.45
	merge_method: dare_ties
	base_model: NousResearch/Llama-2-70b-hf
	parameters:
	int8_mask: true
	dtype: bfloat16
	```

	## 💻 Usage

	```python
	!pip install -qU transformers accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "Technoculture/Medmerge-tulu-70b"
	messages = [{"role": "user", "content": "I am feeling sleepy these days"}]

	tokenizer = AutoTokenizer.from_pretrained(model)
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```