Edit model card

MMedLM

💻Github Repo 🖨️arXiv Paper

The official model weights for "Towards Building Multilingual Language Model for Medicine".

Introduction

This repo contains MMed-Llama 3-8B-EnIns, which is based on MMed-Llama 3-8B. We further fine-tune the model on English instruction fine-tuning dataset(from PMC-LLaMA). We did this for a fair comparison with existing models on commonly-used English benchmarks. Notice that, MMed-Llama 3-8B-EnIns has only been trained on pmc_llama_instructions, which is a English medical SFT dataset. So this model's ability to respond multilingual input is still limited.

The model can be loaded as follows:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Henrychur/MMed-Llama-3-8B-EnIns")
model = AutoModelForCausalLM.from_pretrained("Henrychur/MMed-Llama-3-8B-EnIns", torch_dtype=torch.float16)
  • Inference format is the same as Llama 3, coming soon...

News

[2024.2.21] Our pre-print paper is released ArXiv. Dive into our findings here.

[2024.2.20] We release MMedLM and MMedLM 2. With an auto-regressive continues training on MMedC, these models achieves superior performance compared to all other open-source models, even rivaling GPT-4 on MMedBench.

[2023.2.20] We release MMedC, a multilingual medical corpus containing 25.5B tokens.

[2023.2.20] We release MMedBench, a new multilingual medical multi-choice question-answering benchmark with rationale. Check out the leaderboard here.

Evaluation on Commonly-used English Benchmark

The further pretrained MMed-Llama3 also showcast it's great performance in medical domain on different English benchmarks.

Method Size Year MedQA MedMCQA PubMedQA MMLU_CK MMLU_MG MMLU_AN MMLU_PM MMLU_CB MMLU_CM Avg.
MedAlpaca 7B 2023.3 41.7 37.5 72.8 57.4 69.0 57.0 67.3 65.3 54.3 58.03
PMC-LLaMA 13B 2023.9 56.4 56.0 77.9 - - - - - - -
MEDITRON 7B 2023.11 57.2 59.2 74.4 64.6 59.9 49.3 55.4 53.8 44.8 57.62
Mistral 7B 2023.12 50.8 48.2 75.4 68.7 71.0 55.6 68.4 68.1 59.5 62.97
Gemma 7B 2024.2 47.2 49.0 76.2 69.8 70.0 59.3 66.2 79.9 60.1 64.19
BioMistral 7B 2024.2 50.6 48.1 77.5 59.9 64.0 56.5 60.4 59.0 54.7 58.97
Llama 3 8B 2024.4 60.9 50.7 73.0 72.1 76.0 63.0 77.2 79.9 64.2 68.56
MMed-Llama 3~(Ours) 8B - 65.4 63.5 80.1 71.3 85.0 69.6 77.6 74.3 66.5 72.59

Contact

If you have any question, please feel free to contact qiupengcheng@pjlab.org.cn.

Citation

@misc{qiu2024building,
      title={Towards Building Multilingual Language Model for Medicine}, 
      author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie},
      year={2024},
      eprint={2402.13963},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
79
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference API
Input a message to start chatting with Henrychur/MMed-Llama-3-8B-EnIns.
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Datasets used to train Henrychur/MMed-Llama-3-8B-EnIns