Safetensors
File size: 6,368 Bytes
b081c1f
 
2bf967b
 
 
 
22c4e69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b081c1f
2bf967b
5320d73
 
 
 
 
 
 
 
 
 
2bf967b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5320d73
 
 
 
 
 
 
 
5fc7d7b
5320d73
 
 
 
 
 
 
22c4e69
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
license: apache-2.0
datasets:
- abacusai/MetaMathFewshot
- shahules786/orca-chat
- anon8231489123/ShareGPT_Vicuna_unfiltered
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: Fewshot-Metamath-OrcaVicuna-Mistral-10B
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 56.4
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 78.12
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 59.52
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 50.98
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 76.48
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 13.27
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B
      name: Open LLM Leaderboard
---

```json
{
  "layer_map": [
    [0, 16],
    [8, 24],
    [16, 32]
  ]
}
```

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/pf4d6FA7DriRtVq5HCkxd.png)

This model is a variation of [abacusai/Fewshot-Metamath-OrcaVicuna-Mistral](https://huggingface.co/datasets/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral)
that builds on the idea of scaling up models by duplicating layers of the base model, in this case
[mistralai/Mistral-7B-v0.1](https://huggingface.co/datasets/mistralai/Mistral-7B-v0.1). It relies on the functionality added in this PR
https://github.com/huggingface/peft/pull/1368 to train a model with replicated layers without much extra GPU memory. So although there are 48 layers
that have lora adapters added, there are only 32 original layers so the memory usage is pretty much the same as the memory usage for the base 7B model.

This is just a demonstration model to indicate how this approach can be used and the goal is to apply it to much larger models. For example
models like Goliath or MegaDolphin which are effectively 120B models but using this approach they will only use 70B of memory for the base model and
a little extra for the LoRA adaption layers.

In our training runs we did find a difference in the behavior of the eval loss:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f95cac5f9ba52bbcd7f/vszXUSmANBw6EFjn4sX1N.png)

vs the loss curve for the original LoRA finetune of the 7B model

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f95cac5f9ba52bbcd7f/dis1P2MD_Rsyw81aIVByS.png)

The larger model achieved a best eval loss of 0.3915 vs 0.3971 in a lot fewer steps.

Overall, we think this is a promising approach to accessing much larger models without significantly more resources.

# Performance on Metrics

To do a proper abalation we compared the performance of 4 models trained for ~1 epoch on the combined datasets (Metamath,
Orca, ShareGPT). Here are the results:

| Model | Trainable Params | Train Loss | Eval Loss | GSM8K | TruthfulQA |
| :-----| ------: | ---------: | -------: | ----: | ---------: |
| Mistral 7B | 0 | - | - | 0.374 | 0.426 |
| Mistral 10B | 0 | - | - | 0.290 | 0.407 |
| Mistral 7B + LoRA r=12 | 31M | 0.412 | 0.366 | 0.514 | 0.499 |
| Mistral 10B + LoRA r=8 | 31M | 0.401 | 0.363 | 0.663 | 0.540 |

This ablation compares the base model (Mistral 7B), expansion using the layer map described here and fine tunes of a lora `r=12`
on the base model and `r=8` (to match trainable params). The ablation demonstrates quite clearly that fine tuning the expanded
model leads to a significant improvement in metrics even with the same number of trainable parameters (and training steps).
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_abacusai__Fewshot-Metamath-OrcaVicuna-Mistral-10B)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |55.79|
|AI2 Reasoning Challenge (25-Shot)|56.40|
|HellaSwag (10-Shot)              |78.12|
|MMLU (5-Shot)                    |59.52|
|TruthfulQA (0-shot)              |50.98|
|Winogrande (5-shot)              |76.48|
|GSM8k (5-shot)                   |13.27|