beberik commited on
Commit
5579bc1
1 Parent(s): ea7914b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+
5
+ ## Description
6
+
7
+ This repo contains bf16 files of Nyxene-11B.
8
+
9
+ ## Model used
10
+ - [berkeley-nest/Starling-LM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha)
11
+ - [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B)
12
+ - [fblgit/juanako-7b-UNA](https://huggingface.co/fblgit/juanako-7b-UNA)
13
+ - [ehartford/dolphin-2.1-mistral-7b](https://huggingface.co/ehartford/dolphin-2.1-mistral-7b)
14
+
15
+
16
+
17
+ ## Prompt template
18
+
19
+ The best one after further testing is this one:
20
+
21
+ ```
22
+ <|system|>
23
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
24
+ <|user|>
25
+ {prompt}
26
+ <|assistant|>
27
+ ```
28
+
29
+ ## The secret sauce
30
+
31
+ dolphin-juanako-11B :
32
+ ```
33
+ slices:
34
+ - sources:
35
+ - model: fblgit/juanako-7b-UNA
36
+ layer_range: [0, 24]
37
+ - sources:
38
+ - model: ehartford/dolphin-2.1-mistral-7b
39
+ layer_range: [8, 32]
40
+ merge_method: passthrough
41
+ dtype: bfloat16
42
+ ```
43
+
44
+ Starling-NeuralHermes-11B :
45
+ ```
46
+ slices:
47
+ - sources:
48
+ - model: berkeley-nest/Starling-LM-7B-alpha
49
+ layer_range: [0, 24]
50
+ - sources:
51
+ - model: mlabonne/NeuralHermes-2.5-Mistral-7B
52
+ layer_range: [8, 32]
53
+ merge_method: passthrough
54
+ dtype: bfloat16
55
+ ```
56
+
57
+ Nyxene-11B :
58
+ ```
59
+ slices:
60
+ - sources:
61
+ - model: dolphin-juanako-11B
62
+ layer_range: [0, 48]
63
+ - model: Starling-NeuralHermes-11B
64
+ layer_range: [0, 48]
65
+ merge_method: slerp
66
+ base_model: dolphin-juanako-11B
67
+ parameters:
68
+ t:
69
+ - filter: lm_head
70
+ value: [0.75]
71
+ - filter: embed_tokens
72
+ value: [0.75]
73
+ - filter: self_attn
74
+ value: [0.75, 0.25]
75
+ - filter: mlp
76
+ value: [0.25, 0.75]
77
+ - filter: layernorm
78
+ value: [0.5, 0.5]
79
+ - filter: modelnorm
80
+ value: [0.75]
81
+ - value: 0.5 # fallback for rest of tensors
82
+ dtype: bfloat16
83
+ ```
84
+ I use [mergekit](https://github.com/cg123/mergekit) for all the manipulation told here.