MedLang-13B

This modelcard aims to be a base template for new models. It has been generated using Baichuan-13B.

Model Details

Model Description

Developed by: Huang chiang
Finetuned from model: Baichuan-13B

How to Get Started with the Model

Use the code below to get started with the model.


def init_model():
    model = AutoModelForCausalLM.from_pretrained(
        "EnjoyCodeX/MedLang-13B/MedLang-13B",
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )
    model.generation_config = GenerationConfig.from_pretrained(
        "EnjoyCodeX/MedLang-13B/MedLang-13B",
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "EnjoyCodeX/MedLang-13B/MedLang-13B",
        use_fast=False,
        trust_remote_code=True
    )
    return model, tokenizer

>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers.generation.utils import GenerationConfig
>>> tokenizer = AutoTokenizer.from_pretrained("EnjoyCodeX/MedLang-13B/MedLang-13B", use_fast=False, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained("EnjoyCodeX/MedLang-13B/MedLang-13B", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>> model.generation_config = GenerationConfig.from_pretrained("EnjoyCodeX/MedLang-13B/MedLang-13B")
>>> messages = []
>>> messages.append({"role": "user", "content": "我感觉自己颈椎非常不舒服，每天睡醒都会头痛"})
>>> response = model.chat(tokenizer, messages)
>>> print(response)

Training Details

Training Data

MedDialog,cMedQA-v2,MedMCQA,DrugDB,Alpaca-GPT4-zh

Evaluation

MLEC-QA-Few-shot

MLEC-QA-Zero-shot

CMB

CMD

CMID

Model Architecture

The overall architecture of MedLang-13B is based on the standard Transformer decoder, employing the same model design as LLaMA.
In terms of details, MedLang-13B adopts the following approaches:
(1) Using RoPE as the positional encoding, which is widely adopted in most models at this stage and exhibits excellent scalability;
(2) Setting the context window length to 4096;
(3) Utilizing SwiGLU as the activation function, which excels in handling complex semantic relationships and long-dependency issues in language modeling, hence widely used in large language models including MedLang-13B;
(4) Setting the hidden layer size to 11008 in the feedforward network;
(5) Adopting Pre-Normalization based on RMSNorm as layer normalization.

Recommendations

Although MedLang-13B performs well on both single-turn and multi-turn QA medical datasets, the high demands of reliability and safety in the medical field, especially concerning patient health and life, render the current MedLang-13B unsuitable for deployment in practical medical applications. I solemnly declare that this model is designed solely for the research and testing purposes of individual groups. Users are strongly advised to critically evaluate any information or medical advice provided by the model.

GPU

Precision	GPU Mem (GB)
Precision	GPU Mem (GB)
bf16 / fp16	26.0
int8	15.8
int4	9.7