siddartha-abacus
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -7,6 +7,16 @@ datasets:
|
|
7 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
8 |
---
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
|
11 |
|
12 |
This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
|
@@ -29,4 +39,20 @@ vs the loss curve for the original LoRA finetune of the 7B model
|
|
29 |
|
30 |
The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.
|
31 |
|
32 |
-
Overall, we think this is a promising approach to accessing much larger models without significantly more resources.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
8 |
---
|
9 |
|
10 |
+
```json
|
11 |
+
{
|
12 |
+
"layer_map": [
|
13 |
+
[0, 16],
|
14 |
+
[8, 24],
|
15 |
+
[16, 32]
|
16 |
+
]
|
17 |
+
}
|
18 |
+
```
|
19 |
+
|
20 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
|
21 |
|
22 |
This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
|
|
|
39 |
|
40 |
The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.
|
41 |
|
42 |
+
Overall, we think this is a promising approach to accessing much larger models without significantly more resources.
|
43 |
+
|
44 |
+
# Performance on Metrics
|
45 |
+
|
46 |
+
To do a proper abalation we compared the performance of 4 models trained for ~1 epoch on the combined datasets (Metamath,
|
47 |
+
Orca, ShareGPT). Here are the results:
|
48 |
+
|
49 |
+
| Model | Trainable Params | Train Loss | Eval Loss | GSM8K | TruthfulQA |
|
50 |
+
| :-----| ------: | ---------: | ----- --: | ----: | ---------: |
|
51 |
+
| Mistral 7B | 0 | - | - | 0.374 | 0.426 |
|
52 |
+
| Mistral 10B | 0 | - | - | 0.290 | 0.407 |
|
53 |
+
| Mistral 7B + LoRA r=12 | 31M | 0.412 | 0.366 | 0.514 | 0.499 |
|
54 |
+
| Mistral 10B + LoRA r=8 | 31M | 0.401 | 0.363 | 0.663 | 0.540 |
|
55 |
+
|
56 |
+
This ablation compares the base model (Mistral 7B), expansion using the layer map described here and fine tunes of a lora `r=12`
|
57 |
+
on the base model and `r=8` (to match trainable params). The ablation demonstrates quite clearly that fine tuning the expanded
|
58 |
+
model leads to a significant improvement in metrics even with the same number of trainable parameters (and training steps).
|