File size: 6,600 Bytes
ad8d279 b130667 0dc45f5 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 0dc45f5 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 7972ce2 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 b130667 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 ad8d279 e886881 b130667 e886881 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
library_name: transformers, Unsloth, Peft, trl, accelerate, bitsandbytes
tags:
- medical
- language model
- NLP
license: mit
---
# Model Card for MedChat3.5
## Model Details
### Model Description
MedChat3.5 is a specialized language model based on the OpenChat 3.5 architecture, fine-tuned for biomedical natural language processing (NLP) tasks. The model has been tailored using the Llama2-MedTuned-Instructions dataset, which includes approximately 200,000 samples specifically designed for instruction-based learning in biomedical contexts. The model excels in tasks such as Named Entity Recognition (NER), Relation Extraction (RE), Medical Natural Language Inference (NLI), Document Classification, and Question Answering (QA).
- **Developed by:** Imran Ullah
- **Model type:** Language Model (LM), fine-tuned for medical NLP
- **Language(s) (NLP):** English (Biomedical Text)
- **License:** [MIT]
- **Finetuned from model [optional]:** OpenChat 3.5
## Dataset Information
### Dataset Name: Llama2-MedTuned-Instructions
#### Dataset Description
Llama2-MedTuned-Instructions is an instruction-based dataset developed for training language models in biomedical NLP tasks. Comprising approximately 200,000 samples, the dataset guides models through tasks like Named Entity Recognition (NER), Relation Extraction (RE), Medical Natural Language Inference (NLI), Document Classification, and Question Answering (QA). It consolidates subsets from well-known biomedical datasets, ensuring a diverse and comprehensive training experience.
#### Source Datasets and Composition
- Named Entity Recognition (NER): NCBI-disease, BC5CDR-disease, BC5CDR-chem, BC2GM, JNLPBA, i2b2-2012
- Relation Extraction (RE): i2b2-2010, GAD
- Natural Language Inference (NLI): MedNLI
- Document Classification: Hallmarks of cancer (HoC)
- Question Answering (QA): ChatDoctor, PMC-Llama-Instructions
#### Prompting Strategy
Each sample in the dataset follows a three-part structure: Instruction, Input, and Output, facilitating instruction-based learning.
#### Usage and Application
Ideal for training and evaluating models on biomedical NLP tasks, MedChat3.5 serves as a benchmark for assessing model performance in domain-specific tasks, comparing against established models like BioBERT and BioClinicalBERT.
## Inference Instructions
To use MedChat3.5 for inference, follow the provided code snippet using the `transformers` library. Make sure to install the necessary packages and authenticate using an Hugging Face API token. Adjust parameters like temperature, top-p, and top-k for desired generation behavior. The model is optimized for tasks such as question answering and generating responses in biomedical contexts.
```python
# Example Inference Code
!pip install -q --upgrade git+https://github.com/huggingface/transformers.git
!pip install -q accelerate datasets bitsandbytes peft
# user your own hugging face secret token
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
from transformers import AutoTokenizer, SinkCache, AutoModelForCausalLM, TextStreamer
path = "Imran1/MedChat3.5"
# Load base LLM model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
path,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
load_in_4bit=True,
token=hf_token,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(path, token=hf_token)
tokenizer.eos_token_id = model.config.eos_token_id
tokenizer.pad_token = tokenizer.eos_token
streamer = TextStreamer(tokenizer)
tx = '''
GPT4 Correct Assistant: you are a stomach specialist.<|end_of_turn|>
GPT4 Correct User: What role does gastric acid play in the process of digestion, and how does the stomach regulate its secretion to maintain a healthy digestive environment?<|end_of_turn|>
GPT4 Correct Assistant:
'''
import warnings
warnings.filterwarnings('ignore') # Ignore all warnings
inputs = tokenizer(tx, return_tensors="pt", return_attention_mask=False).to('cuda')
generation_params = {
'max_new_tokens': 500,
'use_cache': True,
'do_sample': True,
'temperature': 0.7,
'top_p': 0.9,
'top_k': 50
}
outputs = model.generate(**inputs, **generation_params, streamer=streamer)
decoded_outputs = tokenizer.batch_decode(outputs)
# output
'''
<s>
GPT4 Correct Assistant: you are stomach specialist.<|end_of_turn|>
GPT4 Correct User: What role does gastric acid play in the process of digestion, and how does the stomach regulate its secretion to maintain a healthy digestive environment?<|end_of_turn|>
GPT4 Correct Assistant:
Gastric acid plays a crucial role in the process of digestion by breaking down food into its basic components. It is secreted by the cells lining the stomach, known as parietal cells, in response to the presence of food in the stomach.
The stomach regulates the secretion of gastric acid through a series of mechanisms that maintain a healthy digestive environment. The primary mechanism is the release of gastrin, a hormone produced by the stomach's G-cells in response to the presence of food. Gastrin stimulates the parietal cells to secrete gastric acid, which in turn aids in the breakdown of food.
The stomach also regulates the secretion of gastric acid through the release of histamine, which is produced by the ECL cells in response to the presence of food. Histamine acts on the parietal cells to stimulate gastric acid secretion.
Another mechanism involves the production of intrinsic factor, a protein produced by the stomach's mucous cells. Intrinsic factor is essential for the absorption of vitamin B12 in the small intestine. The production of intrinsic factor is regulated by gastric acid, which helps maintain a healthy balance of this essential nutrient.
Additionally, the stomach regulates the secretion of gastric acid through the release of somatostatin, a hormone produced by the D-cells of the stomach. Somatostatin inhibits gastric acid secretion, helping to maintain a healthy balance between acid production and neutralization.
In summary, the stomach regulates the secretion of gastric acid through a series of mechanisms that maintain a healthy digestive environment. These mechanisms include the release of gastrin, histamine, and intrinsic factor, as well as the release of somatostatin. By maintaining a balance between acid production and neutralization, the stomach ensures that the digestive environment remains conducive to proper digestion and absorption of nutrients.<|end_of_turn|>
'''
``` |