Tonic commited on
Commit
06c7ac0
1 Parent(s): 1162222

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +279 -100
README.md CHANGED
@@ -1,70 +1,83 @@
1
  ---
2
  library_name: peft
3
  base_model: stabilityai/stablelm-3b-4e1t
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
 
12
  ## Model Details
13
 
14
  ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
 
 
17
 
18
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Shared by [optional]:** [More Information Needed]
22
- - **Model type:** [More Information Needed]
23
- - **Language(s) (NLP):** [More Information Needed]
24
- - **License:** [More Information Needed]
25
- - **Finetuned from model [optional]:** [More Information Needed]
26
 
27
  ### Model Sources [optional]
28
 
29
- <!-- Provide the basic links for the model. -->
30
-
31
- - **Repository:** [More Information Needed]
32
- - **Paper [optional]:** [More Information Needed]
33
  - **Demo [optional]:** [More Information Needed]
34
 
35
  ## Uses
36
 
37
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
 
38
 
39
  ### Direct Use
40
 
41
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
 
43
- [More Information Needed]
44
 
45
  ### Downstream Use [optional]
46
 
47
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
 
49
- [More Information Needed]
50
 
51
  ### Out-of-Scope Use
52
 
53
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
 
55
- [More Information Needed]
 
 
56
 
57
  ## Bias, Risks, and Limitations
58
 
59
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
 
61
- [More Information Needed]
62
 
63
  ### Recommendations
64
 
65
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
 
 
66
 
67
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
 
69
  ## How to Get Started with the Model
70
 
@@ -76,66 +89,115 @@ Use the code below to get started with the model.
76
 
77
  ### Training Data
78
 
79
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
 
81
- [More Information Needed]
 
 
 
 
 
 
 
 
82
 
83
  ### Training Procedure
84
 
85
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
86
 
87
  #### Preprocessing [optional]
88
 
89
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
 
92
  #### Training Hyperparameters
93
 
94
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
 
96
  #### Speeds, Sizes, Times [optional]
97
 
98
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
-
100
- [More Information Needed]
101
-
102
- ## Evaluation
103
-
104
- <!-- This section describes the evaluation protocols and provides the results. -->
105
-
106
- ### Testing Data, Factors & Metrics
107
-
108
- #### Testing Data
109
-
110
- <!-- This should link to a Data Card if possible. -->
111
-
112
- [More Information Needed]
113
-
114
- #### Factors
115
-
116
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Metrics
121
-
122
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
-
124
- [More Information Needed]
125
-
126
- ### Results
127
-
128
- [More Information Needed]
129
-
130
- #### Summary
131
-
132
-
133
-
134
- ## Model Examination [optional]
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
- <!-- Relevant interpretability work for the model goes here -->
137
-
138
- [More Information Needed]
139
 
140
  ## Environmental Impact
141
 
@@ -153,50 +215,167 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
153
 
154
  ### Model Architecture and Objective
155
 
156
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  ### Compute Infrastructure
159
 
160
- [More Information Needed]
161
 
162
  #### Hardware
163
 
164
- [More Information Needed]
165
 
166
  #### Software
167
 
168
- [More Information Needed]
169
-
170
- ## Citation [optional]
171
-
172
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
-
174
- **BibTeX:**
175
-
176
- [More Information Needed]
177
-
178
- **APA:**
179
-
180
- [More Information Needed]
181
-
182
- ## Glossary [optional]
183
-
184
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
-
186
- [More Information Needed]
187
-
188
- ## More Information [optional]
189
-
190
- [More Information Needed]
191
 
192
  ## Model Card Authors [optional]
193
 
194
- [More Information Needed]
195
 
196
  ## Model Card Contact
197
 
198
- [More Information Needed]
199
-
200
 
201
  ## Training procedure
202
 
@@ -216,4 +395,4 @@ The following `bitsandbytes` quantization config was used during training:
216
  ### Framework versions
217
 
218
 
219
- - PEFT 0.6.2.dev0
 
1
  ---
2
  library_name: peft
3
  base_model: stabilityai/stablelm-3b-4e1t
4
+ license: mit
5
+ language:
6
+ - en
7
+ metrics:
8
+ - bleu
9
+ - bertscore
10
+ - accuracy
11
+ tags:
12
+ - medical
13
  ---
14
 
15
  # Model Card for Model ID
16
 
17
+ Welcome to StableMed , it's a stable 3b llm - alpha fine tuned model for Medical Question and Answering.
 
 
18
 
19
  ## Model Details
20
 
21
  ### Model Description
22
 
23
+ This is a stable 3b finetune for medical QnA using MedQuad.
24
+ It's intended for education in public health and sanitation,
25
+ specifically to improve our understanding of outreach and communication.
26
 
27
 
28
 
29
+ - **Developed by:** [Tonic](https://huggingface.co/Tonic)
30
+ - **Shared by [optional]:** [Tonic](https://huggingface.co/Tonic)
31
+ - **Model type:** stable LM 3b - Alpha
32
+ - **Language(s) (NLP):** English
33
+ - **License:** MIT
34
+ - **Finetuned from model [optional]:** [stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)
35
 
36
  ### Model Sources [optional]
37
 
38
+ - **Repository:** [Tonic/stablemed](https://huggingface.co/Tonic/stablemed)
 
 
 
39
  - **Demo [optional]:** [More Information Needed]
40
 
41
  ## Uses
42
 
43
+ Use this model for educational purposes only , do not use for decision support in the wild.
44
+
45
+ Use this model for Medical Q n A.
46
+
47
+ Use this model as a educational tool for "miniature" models.
48
 
49
  ### Direct Use
50
 
51
+ Medical Question and Answering
52
 
 
53
 
54
  ### Downstream Use [optional]
55
 
56
+ Finetune this model to work in a network or swarm of medical finetunes.
57
 
 
58
 
59
  ### Out-of-Scope Use
60
 
61
+ do not use this model in the wild.
62
 
63
+ do not use this model directly.
64
+
65
+ do not use this model for real world decision support.
66
 
67
  ## Bias, Risks, and Limitations
68
 
 
69
 
70
+ [We use Giskard for evaluation - Coming Soon!]
71
 
72
  ### Recommendations
73
 
74
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
75
+
76
+ DO NOT USE THIS MODEL WITHOUT EVALUATION
77
+
78
+ DO NOT USE THIS MODEL WITHOUT BENCHMARKING
79
 
80
+ DO NOT USE THIS MODEL WITHOUT FURTHER FINETUNING
81
 
82
  ## How to Get Started with the Model
83
 
 
89
 
90
  ### Training Data
91
 
 
92
 
93
+ [Dataset](https://huggingface.co/datasets/keivalya/MedQuad-MedicalQnADataset)
94
+
95
+ ```json
96
+ output
97
+ Dataset({
98
+ features: ['qtype', 'Question', 'Answer'],
99
+ num_rows: 16407
100
+ })
101
+ ```
102
 
103
  ### Training Procedure
104
 
105
+ ```json
106
+ trainable params: 12940288 || all params: 1539606528 || trainable%: 0.8404931886596937
107
+ ```
108
+
109
+ Using Lora
110
 
111
  #### Preprocessing [optional]
112
 
113
+ Original:
114
+
115
+ ```json
116
+ StableLMEpochForCausalLM(
117
+ (model): StableLMEpochModel(
118
+ (embed_tokens): Embedding(50304, 2560)
119
+ (layers): ModuleList(
120
+ (0-31): 32 x DecoderLayer(
121
+ (self_attn): Attention(
122
+ (q_proj): Linear4bit(in_features=2560, out_features=2560, bias=False)
123
+ (k_proj): Linear4bit(in_features=2560, out_features=2560, bias=False)
124
+ (v_proj): Linear4bit(in_features=2560, out_features=2560, bias=False)
125
+ (o_proj): Linear4bit(in_features=2560, out_features=2560, bias=False)
126
+ (rotary_emb): RotaryEmbedding()
127
+ )
128
+ (mlp): MLP(
129
+ (gate_proj): Linear4bit(in_features=2560, out_features=6912, bias=False)
130
+ (up_proj): Linear4bit(in_features=2560, out_features=6912, bias=False)
131
+ (down_proj): Linear4bit(in_features=6912, out_features=2560, bias=False)
132
+ (act_fn): SiLU()
133
+ )
134
+ (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
135
+ (post_attention_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
136
+ )
137
+ )
138
+ (norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
139
+ )
140
+ (lm_head): Linear(in_features=2560, out_features=50304, bias=False)
141
+ )
142
+ ```
143
 
144
 
145
  #### Training Hyperparameters
146
 
147
+ - **Training regime:** <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
148
 
149
  #### Speeds, Sizes, Times [optional]
150
 
151
+ ```json
152
+ TrainOutput(global_step=2051, training_loss=0.6156479549198718, metrics={'train_runtime': 22971.4974, 'train_samples_per_second': 0.357, 'train_steps_per_second': 0.089, 'total_flos': 6.5950444363776e+16, 'train_loss': 0.6156479549198718, 'epoch': 0.5})
153
+ ```
154
+
155
+ ## Results
156
+
157
+ | Value | Measurement |
158
+ |-------|-------------|
159
+ | 50 | 1.427000 |
160
+ | 100 | 0.763200 |
161
+ | 150 | 0.708200 |
162
+ | 200 | 0.662300 |
163
+ | 250 | 0.650900 |
164
+ | 300 | 0.617400 |
165
+ | 350 | 0.602900 |
166
+ | 400 | 0.608900 |
167
+ | 450 | 0.596100 |
168
+ | 500 | 0.602000 |
169
+ | 550 | 0.594700 |
170
+ | 600 | 0.584700 |
171
+ | 650 | 0.611000 |
172
+ | 700 | 0.558700 |
173
+ | 750 | 0.616300 |
174
+ | 800 | 0.568700 |
175
+ | 850 | 0.597300 |
176
+ | 900 | 0.607400 |
177
+ | 950 | 0.563200 |
178
+ | 1000 | 0.602900 |
179
+ | 1050 | 0.594900 |
180
+ | 1100 | 0.583000 |
181
+ | 1150 | 0.604500 |
182
+ | 1200 | 0.547400 |
183
+ | 1250 | 0.586600 |
184
+ | 1300 | 0.554300 |
185
+ | 1350 | 0.581000 |
186
+ | 1400 | 0.578900 |
187
+ | 1450 | 0.563200 |
188
+ | 1500 | 0.556800 |
189
+ | 1550 | 0.570300 |
190
+ | 1600 | 0.599800 |
191
+ | 1650 | 0.556000 |
192
+ | 1700 | 0.592500 |
193
+ | 1750 | 0.597200 |
194
+ | 1800 | 0.559100 |
195
+ | 1850 | 0.586100 |
196
+ | 1900 | 0.581100 |
197
+ | 1950 | 0.589400 |
198
+ | 2000 | 0.581100 |
199
+ | 2050 | 0.533100 |
200
 
 
 
 
201
 
202
  ## Environmental Impact
203
 
 
215
 
216
  ### Model Architecture and Objective
217
 
218
+ with LORA :
219
+
220
+ ```json
221
+ PeftModelForCausalLM(
222
+ (base_model): LoraModel(
223
+ (model): StableLMEpochForCausalLM(
224
+ (model): StableLMEpochModel(
225
+ (embed_tokens): Embedding(50304, 2560)
226
+ (layers): ModuleList(
227
+ (0-31): 32 x DecoderLayer(
228
+ (self_attn): Attention(
229
+ (q_proj): Linear4bit(
230
+ (lora_dropout): ModuleDict(
231
+ (default): Dropout(p=0.05, inplace=False)
232
+ )
233
+ (lora_A): ModuleDict(
234
+ (default): Linear(in_features=2560, out_features=8, bias=False)
235
+ )
236
+ (lora_B): ModuleDict(
237
+ (default): Linear(in_features=8, out_features=2560, bias=False)
238
+ )
239
+ (lora_embedding_A): ParameterDict()
240
+ (lora_embedding_B): ParameterDict()
241
+ (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
242
+ )
243
+ (k_proj): Linear4bit(
244
+ (lora_dropout): ModuleDict(
245
+ (default): Dropout(p=0.05, inplace=False)
246
+ )
247
+ (lora_A): ModuleDict(
248
+ (default): Linear(in_features=2560, out_features=8, bias=False)
249
+ )
250
+ (lora_B): ModuleDict(
251
+ (default): Linear(in_features=8, out_features=2560, bias=False)
252
+ )
253
+ (lora_embedding_A): ParameterDict()
254
+ (lora_embedding_B): ParameterDict()
255
+ (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
256
+ )
257
+ (v_proj): Linear4bit(
258
+ (lora_dropout): ModuleDict(
259
+ (default): Dropout(p=0.05, inplace=False)
260
+ )
261
+ (lora_A): ModuleDict(
262
+ (default): Linear(in_features=2560, out_features=8, bias=False)
263
+ )
264
+ (lora_B): ModuleDict(
265
+ (default): Linear(in_features=8, out_features=2560, bias=False)
266
+ )
267
+ (lora_embedding_A): ParameterDict()
268
+ (lora_embedding_B): ParameterDict()
269
+ (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
270
+ )
271
+ (o_proj): Linear4bit(
272
+ (lora_dropout): ModuleDict(
273
+ (default): Dropout(p=0.05, inplace=False)
274
+ )
275
+ (lora_A): ModuleDict(
276
+ (default): Linear(in_features=2560, out_features=8, bias=False)
277
+ )
278
+ (lora_B): ModuleDict(
279
+ (default): Linear(in_features=8, out_features=2560, bias=False)
280
+ )
281
+ (lora_embedding_A): ParameterDict()
282
+ (lora_embedding_B): ParameterDict()
283
+ (base_layer): Linear4bit(in_features=2560, out_features=2560, bias=False)
284
+ )
285
+ (rotary_emb): RotaryEmbedding()
286
+ )
287
+ (mlp): MLP(
288
+ (gate_proj): Linear4bit(
289
+ (lora_dropout): ModuleDict(
290
+ (default): Dropout(p=0.05, inplace=False)
291
+ )
292
+ (lora_A): ModuleDict(
293
+ (default): Linear(in_features=2560, out_features=8, bias=False)
294
+ )
295
+ (lora_B): ModuleDict(
296
+ (default): Linear(in_features=8, out_features=6912, bias=False)
297
+ )
298
+ (lora_embedding_A): ParameterDict()
299
+ (lora_embedding_B): ParameterDict()
300
+ (base_layer): Linear4bit(in_features=2560, out_features=6912, bias=False)
301
+ )
302
+ (up_proj): Linear4bit(
303
+ (lora_dropout): ModuleDict(
304
+ (default): Dropout(p=0.05, inplace=False)
305
+ )
306
+ (lora_A): ModuleDict(
307
+ (default): Linear(in_features=2560, out_features=8, bias=False)
308
+ )
309
+ (lora_B): ModuleDict(
310
+ (default): Linear(in_features=8, out_features=6912, bias=False)
311
+ )
312
+ (lora_embedding_A): ParameterDict()
313
+ (lora_embedding_B): ParameterDict()
314
+ (base_layer): Linear4bit(in_features=2560, out_features=6912, bias=False)
315
+ )
316
+ (down_proj): Linear4bit(
317
+ (lora_dropout): ModuleDict(
318
+ (default): Dropout(p=0.05, inplace=False)
319
+ )
320
+ (lora_A): ModuleDict(
321
+ (default): Linear(in_features=6912, out_features=8, bias=False)
322
+ )
323
+ (lora_B): ModuleDict(
324
+ (default): Linear(in_features=8, out_features=2560, bias=False)
325
+ )
326
+ (lora_embedding_A): ParameterDict()
327
+ (lora_embedding_B): ParameterDict()
328
+ (base_layer): Linear4bit(in_features=6912, out_features=2560, bias=False)
329
+ )
330
+ (act_fn): SiLU()
331
+ )
332
+ (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
333
+ (post_attention_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
334
+ )
335
+ )
336
+ (norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
337
+ )
338
+ (lm_head): Linear(
339
+ in_features=2560, out_features=50304, bias=False
340
+ (lora_dropout): ModuleDict(
341
+ (default): Dropout(p=0.05, inplace=False)
342
+ )
343
+ (lora_A): ModuleDict(
344
+ (default): Linear(in_features=2560, out_features=8, bias=False)
345
+ )
346
+ (lora_B): ModuleDict(
347
+ (default): Linear(in_features=8, out_features=50304, bias=False)
348
+ )
349
+ (lora_embedding_A): ParameterDict()
350
+ (lora_embedding_B): ParameterDict()
351
+ )
352
+ )
353
+ )
354
+ )
355
+ ```
356
 
357
  ### Compute Infrastructure
358
 
359
+ GCS
360
 
361
  #### Hardware
362
 
363
+ T4
364
 
365
  #### Software
366
 
367
+ transformers
368
+ peft
369
+ torch
370
+ datasets
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
371
 
372
  ## Model Card Authors [optional]
373
 
374
+ [Tonic](https://huggingface.co/Tonic)
375
 
376
  ## Model Card Contact
377
 
378
+ [Tonic](https://huggingface.co/Tonic)
 
379
 
380
  ## Training procedure
381
 
 
395
  ### Framework versions
396
 
397
 
398
+ - PEFT 0.6.2.dev0