Safetensors
siddartha-abacus commited on
Commit
5320d73
·
verified ·
1 Parent(s): 2bf967b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -1
README.md CHANGED
@@ -7,6 +7,16 @@ datasets:
7
  - anon8231489123/ShareGPT_Vicuna_unfiltered
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
10
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
11
 
12
  This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
@@ -29,4 +39,20 @@ vs the loss curve for the original LoRA finetune of the 7B model
29
 
30
  The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.
31
 
32
- Overall, we think this is a promising approach to accessing much larger models without significantly more resources.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - anon8231489123/ShareGPT_Vicuna_unfiltered
8
  ---
9
 
10
+ ```json
11
+ {
12
+ "layer_map": [
13
+ [0, 16],
14
+ [8, 24],
15
+ [16, 32]
16
+ ]
17
+ }
18
+ ```
19
+
20
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
21
 
22
  This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
 
39
 
40
  The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.
41
 
42
+ Overall, we think this is a promising approach to accessing much larger models without significantly more resources.
43
+
44
+ # Performance on Metrics
45
+
46
+ To do a proper abalation we compared the performance of 4 models trained for ~1 epoch on the combined datasets (Metamath,
47
+ Orca, ShareGPT). Here are the results:
48
+
49
+ | Model | Trainable Params | Train Loss | Eval Loss | GSM8K | TruthfulQA |
50
+ | :-----| ------: | ---------: | ----- --: | ----: | ---------: |
51
+ | Mistral 7B | 0 | - | - | 0.374 | 0.426 |
52
+ | Mistral 10B | 0 | - | - | 0.290 | 0.407 |
53
+ | Mistral 7B + LoRA r=12 | 31M | 0.412 | 0.366 | 0.514 | 0.499 |
54
+ | Mistral 10B + LoRA r=8 | 31M | 0.401 | 0.363 | 0.663 | 0.540 |
55
+
56
+ This ablation compares the base model (Mistral 7B), expansion using the layer map described here and fine tunes of a lora `r=12`
57
+ on the base model and `r=8` (to match trainable params). The ablation demonstrates quite clearly that fine tuning the expanded
58
+ model leads to a significant improvement in metrics even with the same number of trainable parameters (and training steps).