|
--- |
|
library_name: transformers, Unsloth, Peft, trl, accelerate, bitsandbytes |
|
tags: |
|
- medical |
|
- language model |
|
- NLP |
|
license: mit |
|
--- |
|
|
|
# Model Card for MedChat3.5 |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
MedChat3.5 is a specialized language model based on the OpenChat 3.5 architecture, fine-tuned for biomedical natural language processing (NLP) tasks. The model has been tailored using the Llama2-MedTuned-Instructions dataset, which includes approximately 200,000 samples specifically designed for instruction-based learning in biomedical contexts. The model excels in tasks such as Named Entity Recognition (NER), Relation Extraction (RE), Medical Natural Language Inference (NLI), Document Classification, and Question Answering (QA). |
|
|
|
- **Developed by:** Imran Ullah |
|
- **Model type:** Language Model (LM), fine-tuned for medical NLP |
|
- **Language(s) (NLP):** English (Biomedical Text) |
|
- **License:** [MIT] |
|
- **Finetuned from model [optional]:** OpenChat 3.5 |
|
|
|
## Dataset Information |
|
|
|
### Dataset Name: Llama2-MedTuned-Instructions |
|
|
|
#### Dataset Description |
|
|
|
Llama2-MedTuned-Instructions is an instruction-based dataset developed for training language models in biomedical NLP tasks. Comprising approximately 200,000 samples, the dataset guides models through tasks like Named Entity Recognition (NER), Relation Extraction (RE), Medical Natural Language Inference (NLI), Document Classification, and Question Answering (QA). It consolidates subsets from well-known biomedical datasets, ensuring a diverse and comprehensive training experience. |
|
|
|
#### Source Datasets and Composition |
|
|
|
- Named Entity Recognition (NER): NCBI-disease, BC5CDR-disease, BC5CDR-chem, BC2GM, JNLPBA, i2b2-2012 |
|
- Relation Extraction (RE): i2b2-2010, GAD |
|
- Natural Language Inference (NLI): MedNLI |
|
- Document Classification: Hallmarks of cancer (HoC) |
|
- Question Answering (QA): ChatDoctor, PMC-Llama-Instructions |
|
|
|
#### Prompting Strategy |
|
|
|
Each sample in the dataset follows a three-part structure: Instruction, Input, and Output, facilitating instruction-based learning. |
|
|
|
#### Usage and Application |
|
|
|
Ideal for training and evaluating models on biomedical NLP tasks, MedChat3.5 serves as a benchmark for assessing model performance in domain-specific tasks, comparing against established models like BioBERT and BioClinicalBERT. |
|
|
|
## Inference Instructions |
|
|
|
To use MedChat3.5 for inference, follow the provided code snippet using the `transformers` library. Make sure to install the necessary packages and authenticate using an Hugging Face API token. Adjust parameters like temperature, top-p, and top-k for desired generation behavior. The model is optimized for tasks such as question answering and generating responses in biomedical contexts. |
|
|
|
```python |
|
# Example Inference Code |
|
!pip install -q --upgrade git+https://github.com/huggingface/transformers.git |
|
!pip install -q accelerate datasets bitsandbytes peft |
|
|
|
# user your own hugging face secret token |
|
from google.colab import userdata |
|
hf_token = userdata.get('HF_TOKEN') |
|
|
|
import torch |
|
from peft import AutoPeftModelForCausalLM |
|
from transformers import AutoTokenizer |
|
from transformers import AutoTokenizer, SinkCache, AutoModelForCausalLM, TextStreamer |
|
|
|
path = "Imran1/MedChat3.5" |
|
|
|
# Load base LLM model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained( |
|
path, |
|
low_cpu_mem_usage=True, |
|
torch_dtype=torch.float16, |
|
load_in_4bit=True, |
|
token=hf_token, |
|
trust_remote_code=True, |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(path, token=hf_token) |
|
|
|
tokenizer.eos_token_id = model.config.eos_token_id |
|
tokenizer.pad_token = tokenizer.eos_token |
|
streamer = TextStreamer(tokenizer) |
|
|
|
tx = ''' |
|
GPT4 Correct Assistant: you are a stomach specialist.<|end_of_turn|> |
|
GPT4 Correct User: What role does gastric acid play in the process of digestion, and how does the stomach regulate its secretion to maintain a healthy digestive environment?<|end_of_turn|> |
|
GPT4 Correct Assistant: |
|
''' |
|
|
|
import warnings |
|
warnings.filterwarnings('ignore') # Ignore all warnings |
|
|
|
inputs = tokenizer(tx, return_tensors="pt", return_attention_mask=False).to('cuda') |
|
generation_params = { |
|
'max_new_tokens': 500, |
|
'use_cache': True, |
|
'do_sample': True, |
|
'temperature': 0.7, |
|
'top_p': 0.9, |
|
'top_k': 50 |
|
} |
|
|
|
outputs = model.generate(**inputs, **generation_params, streamer=streamer) |
|
decoded_outputs = tokenizer.batch_decode(outputs) |
|
|
|
# output |
|
''' |
|
<s> |
|
GPT4 Correct Assistant: you are stomach specialist.<|end_of_turn|> |
|
GPT4 Correct User: What role does gastric acid play in the process of digestion, and how does the stomach regulate its secretion to maintain a healthy digestive environment?<|end_of_turn|> |
|
GPT4 Correct Assistant: |
|
Gastric acid plays a crucial role in the process of digestion by breaking down food into its basic components. It is secreted by the cells lining the stomach, known as parietal cells, in response to the presence of food in the stomach. |
|
|
|
The stomach regulates the secretion of gastric acid through a series of mechanisms that maintain a healthy digestive environment. The primary mechanism is the release of gastrin, a hormone produced by the stomach's G-cells in response to the presence of food. Gastrin stimulates the parietal cells to secrete gastric acid, which in turn aids in the breakdown of food. |
|
|
|
The stomach also regulates the secretion of gastric acid through the release of histamine, which is produced by the ECL cells in response to the presence of food. Histamine acts on the parietal cells to stimulate gastric acid secretion. |
|
|
|
Another mechanism involves the production of intrinsic factor, a protein produced by the stomach's mucous cells. Intrinsic factor is essential for the absorption of vitamin B12 in the small intestine. The production of intrinsic factor is regulated by gastric acid, which helps maintain a healthy balance of this essential nutrient. |
|
|
|
Additionally, the stomach regulates the secretion of gastric acid through the release of somatostatin, a hormone produced by the D-cells of the stomach. Somatostatin inhibits gastric acid secretion, helping to maintain a healthy balance between acid production and neutralization. |
|
|
|
In summary, the stomach regulates the secretion of gastric acid through a series of mechanisms that maintain a healthy digestive environment. These mechanisms include the release of gastrin, histamine, and intrinsic factor, as well as the release of somatostatin. By maintaining a balance between acid production and neutralization, the stomach ensures that the digestive environment remains conducive to proper digestion and absorption of nutrients.<|end_of_turn|> |
|
''' |
|
``` |