PEFT
English
medical
mistralmed / README.md
Tonic's picture
Update README.md
b25b70d
|
raw
history blame
No virus
11 kB
---
library_name: peft
base_model: mistralai/Mistral-7B-v0.1
license: mit
datasets:
- keivalya/MedQuad-MedicalQnADataset
language:
- en
metrics:
- bertscore
tags:
- medical
---
# Model Card for Model ID
This is a medicine-focussed mistral fine tuned using keivalya/MedQuad-MedicalQnADataset
## Model Details
### Model Description
Trying to get better at medical Q & A
- **Developed by:** [Tonic](https://huggingface.co/Tonic)
- **Shared by [optional]:** [Tonic](https://huggingface.co/Tonic)
- **Model type:** Mistral Fine-Tune
- **Language(s) (NLP):** English
- **License:** MIT2.0
- **Finetuned from model [optional]:** [mistralai/Mistral-7B-v0.1](https://huggingface.com/Mistralai/Mistral-7B-v0.1)
### Model Sources [optional]
- **Repository:** [Tonic/mistralmed](https://huggingface.co/Tonic/mistralmed)
- **Code :** [github](https://github.com/Josephrp/mistralmed/blob/main/finetuning.py)
- **Demo :** [Tonic/MistralMed_Chat](https://huggingface.co/Tonic/MistralMed_Chat)
## Uses
This model can be used the same way you normally use mistral
### Direct Use
This model can do better in medical question and answer scenarios.
### Downstream Use [optional]
This model is intended to be further fine tuned.
### Recommendations
- Do Not Use As Is
- Fine Tune This Model Further
- For Educational Purposes Only
- Benchmark your model usage
- Evaluate the model before use
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## How to Get Started with the Model
Use the code below to get started with the model.
[Tonic/MistralMED_Chat](https://huggingface.co/Tonic/MistralMED_Chat)
## Training Details
### Training Data
[MedQuad](https://huggingface.co/datasets/keivalya/MedQuad-MedicalQnADataset/viewer/default/train)
### Training Procedure
Dataset({
features: ['qtype', 'Question', 'Answer'],
num_rows: 16407
})
#### Preprocessing [optional]
MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x MistralDecoderLayer(
(self_attn): MistralAttention(
(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
(v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
#### Training Hyperparameters
- **Training regime:**
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
],
bias="none",
lora_dropout=0.05, # Conventional
task_type="CAUSAL_LM",
)
#### Speeds, Sizes, Times [optional]
- trainable params: 21260288 || all params: 3773331456 || trainable%: 0.5634354746703705
- TrainOutput(global_step=1000, training_loss=0.47226515007019043, metrics={'train_runtime': 3143.4141, 'train_samples_per_second': 2.545, 'train_steps_per_second': 0.318, 'total_flos': 1.75274075357184e+17, 'train_loss': 0.47226515007019043, 'epoch': 0.49})
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** A100
- **Hours used:** 1
- **Cloud Provider:** Google
- **Compute Region:** East1
- **Carbon Emitted:** 0.09
## Training Results
[1000/1000 52:20, Epoch 0/1]
Step Training Loss
50 0.474200
100 0.523300
150 0.484500
200 0.482800
250 0.498800
300 0.451800
350 0.491800
400 0.488000
450 0.472800
500 0.460400
550 0.464700
600 0.484800
650 0.474600
700 0.477900
750 0.445300
800 0.431300
850 0.461500
900 0.451200
950 0.470800
1000 0.454900
### Model Architecture and Objective
PeftModelForCausalLM(
(base_model): LoraModel(
(model): MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x MistralDecoderLayer(
(self_attn): MistralAttention(
(q_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
)
(k_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
)
(v_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
)
(o_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=14336, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
)
(up_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=14336, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
)
(down_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=14336, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=14336, out_features=4096, bias=False)
)
(act_fn): SiLUActivation()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)
(lm_head): Linear(
in_features=4096, out_features=32000, bias=False
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=32000, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
)
)
)
#### Hardware
A100
## Model Card Authors [optional]
[Tonic](https://huggingface.co/Tonic)
## Model Card Contact
[Tonic](https://huggingface.co/Tonic)
## Training procedure
The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
### Framework versions
- PEFT 0.6.0.dev0