--- license: apache-2.0 language: - en tags: - medical --- # MedLang-13B This modelcard aims to be a base template for new models. It has been generated using [Baichuan-13B](https://huggingface.co/baichuan-inc/Baichuan-13B-Base). ## Model Details ### Model Description - **Developed by:** Huang chiang - **Model type:** Medical LLM - **Language(s) (NLP):** Chinese - **Finetuned from model:** Baichuan-13B ## How to Get Started with the Model Use the code below to get started with the model.

def init_model():
    model = AutoModelForCausalLM.from_pretrained(
        "EnjoyCodeX/MedLang-13B/MedLang-13B",
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )
    model.generation_config = GenerationConfig.from_pretrained(
        "EnjoyCodeX/MedLang-13B/MedLang-13B",
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "EnjoyCodeX/MedLang-13B/MedLang-13B",
        use_fast=False,
        trust_remote_code=True
    )
    return model, tokenizer
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers.generation.utils import GenerationConfig
>>> tokenizer = AutoTokenizer.from_pretrained("EnjoyCodeX/MedLang-13B/MedLang-13B", use_fast=False, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("EnjoyCodeX/MedLang-13B/MedLang-13B", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>> model.generation_config = GenerationConfig.from_pretrained("EnjoyCodeX/MedLang-13B/MedLang-13B")
>>> messages = []
>>> messages.append({"role": "user", "content": "我感觉自己颈椎非常不舒服,每天睡醒都会头痛"})
>>> response = model.chat(tokenizer, messages)
>>> print(response)
## Training Details ### Training Data MedDialog,cMedQA-v2,MedMCQA,DrugDB,Alpaca-GPT4-zh ## Evaluation MLEC-QA-Few-shot ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6460f814051604bda02b8f11/O4jLFn9DzgdWgtQ9V63aX.png) MLEC-QA-Zero-shot ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6460f814051604bda02b8f11/0AqpVKlQAyeszfPMCx7Xv.png) CMB ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6460f814051604bda02b8f11/-RB0wPZYoR99HnUCe8Aii.png) CMD ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6460f814051604bda02b8f11/6VGxnqmuhuS0XYLwvxEj7.png) CMID ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6460f814051604bda02b8f11/x0mxfNyP97kLT9RX_7S3w.png) ### Model Architecture The overall architecture of MedLang-13B is based on the standard Transformer decoder, employing the same model design as LLaMA.
In terms of details, MedLang-13B adopts the following approaches:
(1) Using RoPE as the positional encoding, which is widely adopted in most models at this stage and exhibits excellent scalability;
(2) Setting the context window length to 4096;
(3) Utilizing SwiGLU as the activation function, which excels in handling complex semantic relationships and long-dependency issues in language modeling, hence widely used in large language models including MedLang-13B;
(4) Setting the hidden layer size to 11008 in the feedforward network;
(5) Adopting Pre-Normalization based on RMSNorm as layer normalization.
### Recommendations Although MedLang-13B performs well on both single-turn and multi-turn QA medical datasets, the high demands of reliability and safety in the medical field, especially concerning patient health and life, render the current MedLang-13B unsuitable for deployment in practical medical applications. I solemnly declare that this model is designed solely for the research and testing purposes of individual groups. Users are strongly advised to critically evaluate any information or medical advice provided by the model. #### GPU

Precision

GPU Mem (GB)
bf16 / fp16 26.0
int8 15.8
int4 9.7