|
--- |
|
library_name: peft |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
license: mit |
|
datasets: |
|
- keivalya/MedQuad-MedicalQnADataset |
|
language: |
|
- en |
|
metrics: |
|
- bertscore |
|
tags: |
|
- medical |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This is a medicine-focussed mistral fine tuned using keivalya/MedQuad-MedicalQnADataset |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Trying to get better at medical Q & A |
|
|
|
|
|
- **Developed by:** [Tonic](https://huggingface.co/Tonic) |
|
- **Shared by [optional]:** [Tonic](https://huggingface.co/Tonic) |
|
- **Model type:** Mistral Fine-Tune |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT2.0 |
|
- **Finetuned from model [optional]:** [mistralai/Mistral-7B-v0.1](https://huggingface.com/Mistralai/Mistral-7B-v0.1) |
|
|
|
### Model Sources [optional] |
|
|
|
|
|
- **Repository:** [Tonic/mistralmed](https://huggingface.co/Tonic/mistralmed) |
|
- **Code :** [github](https://github.com/Josephrp/mistralmed/blob/main/finetuning.py) |
|
- **Demo :** [Tonic/MistralMed_Chat](https://huggingface.co/Tonic/MistralMed_Chat) |
|
|
|
## Uses |
|
|
|
This model can be used the same way you normally use mistral |
|
|
|
### Direct Use |
|
|
|
This model can do better in medical question and answer scenarios. |
|
|
|
### Downstream Use [optional] |
|
|
|
This model is intended to be further fine tuned. |
|
|
|
### Recommendations |
|
|
|
- Do Not Use As Is |
|
- Fine Tune This Model Further |
|
- For Educational Purposes Only |
|
- Benchmark your model usage |
|
- Evaluate the model before use |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[Tonic/MistralMED_Chat](https://huggingface.co/Tonic/MistralMED_Chat) |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
[MedQuad](https://huggingface.co/datasets/keivalya/MedQuad-MedicalQnADataset/viewer/default/train) |
|
|
|
### Training Procedure |
|
|
|
Dataset({ |
|
features: ['qtype', 'Question', 'Answer'], |
|
num_rows: 16407 |
|
}) |
|
|
|
|
|
#### Preprocessing [optional] |
|
|
|
MistralForCausalLM( |
|
(model): MistralModel( |
|
(embed_tokens): Embedding(32000, 4096) |
|
(layers): ModuleList( |
|
(0-31): 32 x MistralDecoderLayer( |
|
(self_attn): MistralAttention( |
|
(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) |
|
(k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False) |
|
(v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False) |
|
(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False) |
|
(rotary_emb): MistralRotaryEmbedding() |
|
) |
|
(mlp): MistralMLP( |
|
(gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) |
|
(up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) |
|
(down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False) |
|
(act_fn): SiLUActivation() |
|
) |
|
(input_layernorm): MistralRMSNorm() |
|
(post_attention_layernorm): MistralRMSNorm() |
|
) |
|
) |
|
(norm): MistralRMSNorm() |
|
) |
|
(lm_head): Linear(in_features=4096, out_features=32000, bias=False) |
|
) |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** |
|
config = LoraConfig( |
|
r=8, |
|
lora_alpha=16, |
|
target_modules=[ |
|
"q_proj", |
|
"k_proj", |
|
"v_proj", |
|
"o_proj", |
|
"gate_proj", |
|
"up_proj", |
|
"down_proj", |
|
"lm_head", |
|
], |
|
bias="none", |
|
lora_dropout=0.05, # Conventional |
|
task_type="CAUSAL_LM", |
|
) |
|
|
|
#### Speeds, Sizes, Times [optional] |
|
|
|
- trainable params: 21260288 || all params: 3773331456 || trainable%: 0.5634354746703705 |
|
- TrainOutput(global_step=1000, training_loss=0.47226515007019043, metrics={'train_runtime': 3143.4141, 'train_samples_per_second': 2.545, 'train_steps_per_second': 0.318, 'total_flos': 1.75274075357184e+17, 'train_loss': 0.47226515007019043, 'epoch': 0.49}) |
|
|
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** A100 |
|
- **Hours used:** 1 |
|
- **Cloud Provider:** Google |
|
- **Compute Region:** East1 |
|
- **Carbon Emitted:** 0.09 |
|
|
|
## Training Results |
|
|
|
[1000/1000 52:20, Epoch 0/1] |
|
Step Training Loss |
|
50 0.474200 |
|
100 0.523300 |
|
150 0.484500 |
|
200 0.482800 |
|
250 0.498800 |
|
300 0.451800 |
|
350 0.491800 |
|
400 0.488000 |
|
450 0.472800 |
|
500 0.460400 |
|
550 0.464700 |
|
600 0.484800 |
|
650 0.474600 |
|
700 0.477900 |
|
750 0.445300 |
|
800 0.431300 |
|
850 0.461500 |
|
900 0.451200 |
|
950 0.470800 |
|
1000 0.454900 |
|
|
|
### Model Architecture and Objective |
|
|
|
PeftModelForCausalLM( |
|
(base_model): LoraModel( |
|
(model): MistralForCausalLM( |
|
(model): MistralModel( |
|
(embed_tokens): Embedding(32000, 4096) |
|
(layers): ModuleList( |
|
(0-31): 32 x MistralDecoderLayer( |
|
(self_attn): MistralAttention( |
|
(q_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=4096, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False) |
|
) |
|
(k_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=1024, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False) |
|
) |
|
(v_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=1024, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False) |
|
) |
|
(o_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=4096, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False) |
|
) |
|
(rotary_emb): MistralRotaryEmbedding() |
|
) |
|
(mlp): MistralMLP( |
|
(gate_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=14336, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False) |
|
) |
|
(up_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=14336, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False) |
|
) |
|
(down_proj): Linear4bit( |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=14336, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=4096, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
(base_layer): Linear4bit(in_features=14336, out_features=4096, bias=False) |
|
) |
|
(act_fn): SiLUActivation() |
|
) |
|
(input_layernorm): MistralRMSNorm() |
|
(post_attention_layernorm): MistralRMSNorm() |
|
) |
|
) |
|
(norm): MistralRMSNorm() |
|
) |
|
(lm_head): Linear( |
|
in_features=4096, out_features=32000, bias=False |
|
(lora_dropout): ModuleDict( |
|
(default): Dropout(p=0.05, inplace=False) |
|
) |
|
(lora_A): ModuleDict( |
|
(default): Linear(in_features=4096, out_features=8, bias=False) |
|
) |
|
(lora_B): ModuleDict( |
|
(default): Linear(in_features=8, out_features=32000, bias=False) |
|
) |
|
(lora_embedding_A): ParameterDict() |
|
(lora_embedding_B): ParameterDict() |
|
) |
|
) |
|
) |
|
) |
|
#### Hardware |
|
|
|
A100 |
|
|
|
|
|
|
|
## Model Card Authors [optional] |
|
|
|
[Tonic](https://huggingface.co/Tonic) |
|
|
|
## Model Card Contact |
|
|
|
[Tonic](https://huggingface.co/Tonic) |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
The following `bitsandbytes` quantization config was used during training: |
|
- quant_method: bitsandbytes |
|
- load_in_8bit: False |
|
- load_in_4bit: True |
|
- llm_int8_threshold: 6.0 |
|
- llm_int8_skip_modules: None |
|
- llm_int8_enable_fp32_cpu_offload: False |
|
- llm_int8_has_fp16_weight: False |
|
- bnb_4bit_quant_type: nf4 |
|
- bnb_4bit_use_double_quant: True |
|
- bnb_4bit_compute_dtype: bfloat16 |
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.6.0.dev0 |