File size: 2,399 Bytes
d8ccede
 
 
 
 
 
 
 
b8f80fe
70b6b2b
 
 
 
 
 
 
 
ec03411
70b6b2b
 
d8ccede
 
 
 
 
 
 
 
 
 
70b6b2b
d8ccede
 
70b6b2b
d8ccede
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70b6b2b
 
 
d8ccede
70b6b2b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8ccede
70b6b2b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
base_model: []
library_name: transformers
tags:
- mergekit
- merge

---
# boreas-10_7b-step1

# !NB: THIS MODEL NEEDS CONTINUED (PRE)TRAINING

This is the result of step 1 of the upscaling of [Boreas-7B](https://huggingface.co/yhavinga/Boreas-7B) with [mergekit](https://github.com/cg123/mergekit).
It is trying to reproduce the upscaling described in the [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166)
paper.
This model is the result after step 1 from the figure below:

![SOLAR 10.7B Depth up scaling](img_2.png)

Step 2 continued training is being done - result will be another model.


## Merge Details
### Merge Method

This model was merged using the passthrough merge method.

### Models Merged

The following models were included in the merge:
* boreas-7b-8-16-24-32
* boreas-7b-16-32
* boreas-7b-0-8-16-24
* boreas-7b-0-16

### Configuration

The following YAML configuration was used to produce this model:

```yaml
slices:
  - sources:
    - model: boreas-7b-0-16
      layer_range: [0, 16]
  - sources:
    - model: boreas-7b-0-8-16-24
      layer_range: [0, 8]
  - sources:
    - model: boreas-7b-8-16-24-32
      layer_range: [0, 8]
  - sources:
    - model: boreas-7b-16-32
      layer_range: [0, 16]
merge_method: passthrough
dtype: bfloat16
```

The four models were created with the following configurations:

```yaml
slices:
  - sources:
    - model: yhavinga/Boreas-7B
      layer_range: [0, 16]
merge_method: passthrough
dtype: bfloat16
---
slices:
  - sources:
      - model: yhavinga/Boreas-7B
        layer_range: [0, 8]
      - model: yhavinga/Boreas-7B
        layer_range: [16, 24]
merge_method: slerp
base_model: yhavinga/Boreas-7B
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
---
slices:
  - sources:
      - model: yhavinga/Boreas-7B
        layer_range: [8, 16]
      - model: yhavinga/Boreas-7B
        layer_range: [24, 32]
merge_method: slerp
base_model: yhavinga/Boreas-7B
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
---
slices:
  - sources:
    - model: yhavinga/Boreas-7B
      layer_range: [16, 32]
merge_method: passthrough
dtype: bfloat16
```