abacusai
/

Fewshot-Metamath-OrcaVicuna-Mistral-10B

Model card Files Files and versions Community

siddartha-abacus commited on Jan 31, 2024

Commit

5320d73

·

verified ·

1 Parent(s): 2bf967b

Update README.md

Files changed (1) hide show

README.md +27 -1

README.md CHANGED Viewed

@@ -7,6 +7,16 @@ datasets:
 - anon8231489123/ShareGPT_Vicuna_unfiltered
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
 This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
@@ -29,4 +39,20 @@ vs the loss curve for the original LoRA finetune of the 7B model
 The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.
-Overall, we think this is a promising approach to accessing much larger models without significantly more resources.

 - anon8231489123/ShareGPT_Vicuna_unfiltered
 ---
+```json
+{
+  "layer_map": [
+    [0, 16],
+    [8, 24],
+    [16, 32]
+  ]
+}
+```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)
 This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
 The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.
+Overall, we think this is a promising approach to accessing much larger models without significantly more resources.
+# Performance on Metrics
+To do a proper abalation we compared the performance of 4 models trained for ~1 epoch on the combined datasets (Metamath,
+Orca, ShareGPT). Here are the results:
+| Model | Trainable Params | Train Loss | Eval Loss | GSM8K | TruthfulQA |
+| :-----| ------: | ---------: | ----- --: | ----: | ---------: |
+| Mistral 7B | 0 | - | - | 0.374 | 0.426 |
+| Mistral 10B | 0 | - | - | 0.290 | 0.407 |
+| Mistral 7B + LoRA r=12 | 31M | 0.412 | 0.366 | 0.514 | 0.499 |
+| Mistral 10B + LoRA r=8 | 31M | 0.401 | 0.363 | 0.663 | 0.540 |
+This ablation compares the base model (Mistral 7B), expansion using the layer map described here and fine tunes of a lora `r=12`
+on the base model and `r=8` (to match trainable params). The ablation demonstrates quite clearly that fine tuning the expanded
+model leads to a significant improvement in metrics even with the same number of trainable parameters (and training steps).