mlinmg commited on
Commit
621145d
1 Parent(s): ba2e877

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: yi-license
4
+ license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
5
+ language:
6
+ - en,
7
+ pipeline_tag: conversational
8
+ ---
9
+ <p align="center">
10
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/644ba0c76ebb3ebf7264dbe9/PWn9I-0XH7kSP_YXcyxIg.png" width="400"/>
11
+ </p>
12
+
13
+ ---
14
+
15
+ # SG Raccoon 55B
16
+
17
+ The first 55B auto-regressive causal LM created by combining 2x finetuned llamafied [Yi 34b](https://huggingface.co/01-ai/Yi-34B) with *200K context* into one.
18
+
19
+
20
+ # Prompting Format
21
+
22
+ ```
23
+ SYSTEM: <ANY SYSTEM CONTEXT>
24
+ USER:
25
+ ASSISTANT:
26
+ ```
27
+
28
+ # Merge process
29
+
30
+ The models used in the merge are [Tess-M-v1.3](https://huggingface.co/migtissera/Tess-M-v1.3/) and [airoboros-3_1-yi-34b-200k](bhenrym14/airoboros-3_1-yi-34b-200k).
31
+
32
+ The layer ranges used are as follows:
33
+
34
+ ```yaml
35
+ - model: bhenrym14/airoboros-3_1-yi-34b-200k
36
+ layer_range: [0, 14]
37
+ - model: migtissera/Tess-M-v1.3
38
+ layer_range: [7, 21]
39
+ - model: bhenrym14/airoboros-3_1-yi-34b-200k
40
+ layer_range: [15, 29]
41
+ - model: migtissera/Tess-M-v1.3
42
+ layer_range: [22, 36]
43
+ - model: bhenrym14/airoboros-3_1-yi-34b-200k
44
+ layer_range: [30, 44]
45
+ - model: migtissera/Tess-M-v1.3
46
+ layer_range: [37, 51]
47
+ - model: bhenrym14/airoboros-3_1-yi-34b-200k
48
+ layer_range: [45, 59]
49
+ ```
50
+
51
+ # Tips
52
+
53
+ Being a Yi model, try disabling the BOS token and/or running a lower temperature with MinP (and no other samplers) if output doesn't seem right. Yi tends to run "hot" by default.
54
+
55
+ Sometimes the model "spells out" the stop token as </s> like Capybara, so you may need to add </s> as an additional stopping condition.
56
+
57
+
58
+ # Benchmarks
59
+ Coming soon.
60
+
61
+ # Acknowledgements
62
+ - Special thanks to [MSS](https://milanosamplesale.com/) for sponsoring this project
63
+
64
+ - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
65
+
66
+ - Great thanks to [@Undi95](https://huggingface.co/Undi95) for helping figuring out model merge options
67
+
68
+ - Also credits to the [01-ai](https://huggingface.co/01-ai) team for their amazing models
69
+
70
+ - This merged model is inspired by [Goliath 120B](https://huggingface.co/alpindale/goliath-120b)