asiansoul commited on
Commit
b9cfc7b
β€’
1 Parent(s): f9807dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -5
README.md CHANGED
@@ -1,5 +1,132 @@
1
- ---
2
- license: other
3
- license_name: other
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - maum-ai/Llama-3-MAAL-8B-Instruct-v0.1
4
+ - beomi/Llama-3-KoEn-8B-Instruct-preview
5
+ - asiansoul/Llama-3-Open-Ko-Linear-8B
6
+ - NousResearch/Meta-Llama-3-8B
7
+ - NousResearch/Meta-Llama-3-8B-Instruct
8
+ - ajibawa-2023/Code-Llama-3-8B
9
+ - defog/llama-3-sqlcoder-8b
10
+ - NousResearch/Hermes-2-Pro-Llama-3-8B
11
+ - Locutusque/llama-3-neural-chat-v2.2-8B
12
+ - asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1
13
+ library_name: transformers
14
+ tags:
15
+ - mergekit
16
+ - merge
17
+
18
+ ---
19
+ # Joah-Llama-3-KoEn-8B-Coder-v2
20
+
21
+ <a href="https://ibb.co/2Srsmn7"><img src="https://i.ibb.co/f9WnB1Y/Screenshot-2024-05-11-at-7-15-42-PM.png" alt="Screenshot-2024-05-11-at-7-15-42-PM" border="0"></a>
22
+
23
+ 였늘 λΆ€ν„° μ„œλ‘œμ—κ²Œ 빛이 λ˜μ–΄ 쀄 μ—¬λŸ¬λΆ„μ˜ Merge Model
24
+
25
+ "μ’‹μ•„(Joah)" by AsianSoul
26
+
27
+ Soon Multi Language Model Merge based on this. First German Start (Korean / English / German) 🌍
28
+
29
+ Where to use Joah : Medical, Korean, English, Translation, Code, Science... πŸŽ₯
30
+
31
+ Strengthened SQL code & Other Sci compared to V1
32
+
33
+ ## 🎑 Merge Details
34
+
35
+
36
+ The performance of this merge model doesn't seem to be bad though.-> Just opinion ^^ 🏟️
37
+
38
+ This may not be a model that satisfies you. But if we continue to overcome our shortcomings,
39
+
40
+ Won't we someday find the answer we want?
41
+
42
+ Don't worry even if you don't get the results you want.
43
+
44
+ I'll find the answer for you.
45
+
46
+ Soon real PoSE to extend Llama's context length to 64k with using my merge method : [reborn](https://medium.com/@puffanddmx82/reborn-elevating-model-adaptation-with-merging-for-superior-nlp-performance-f604e8e307b2)
47
+
48
+ I have found that most of merge's model outside so far do not actually have 64k in their configs. I will improve it in the next merge with my reborn. If that doesn't work, I guess I'll have to find another way, right?
49
+
50
+ 256k is not possible. My computer is running out of memory.
51
+
52
+ If you support me, i will try it on a computer with maximum specifications, also, i would like to conduct great tests by building a network with high-capacity traffic and high-speed 10G speeds for you.
53
+
54
+ ### Merge Method
55
+
56
+ This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) as a base.
57
+
58
+ ### Models Merged
59
+
60
+ The following models were included in the merge:
61
+ * [maum-ai/Llama-3-MAAL-8B-Instruct-v0.1](https://huggingface.co/maum-ai/Llama-3-MAAL-8B-Instruct-v0.1)
62
+ * [beomi/Llama-3-KoEn-8B-Instruct-preview](https://huggingface.co/beomi/Llama-3-KoEn-8B-Instruct-preview)
63
+ * [asiansoul/Llama-3-Open-Ko-Linear-8B](https://huggingface.co/asiansoul/Llama-3-Open-Ko-Linear-8B)
64
+ * [NousResearch/Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
65
+ * [ajibawa-2023/Code-Llama-3-8B](https://huggingface.co/ajibawa-2023/Code-Llama-3-8B)
66
+ * [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b)
67
+ * [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
68
+ * [Locutusque/llama-3-neural-chat-v2.2-8B](https://huggingface.co/Locutusque/llama-3-neural-chat-v2.2-8B)
69
+ * [asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1](https://huggingface.co/asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1)
70
+
71
+ ### Configuration
72
+
73
+ The following YAML configuration was used to produce this model:
74
+
75
+ ```yaml
76
+ models:
77
+ - model: NousResearch/Meta-Llama-3-8B
78
+ # Base model providing a general foundation without specific parameters
79
+
80
+ - model: NousResearch/Meta-Llama-3-8B-Instruct
81
+ parameters:
82
+ density: 0.60
83
+ weight: 0.25
84
+
85
+ - model: beomi/Llama-3-KoEn-8B-Instruct-preview
86
+ parameters:
87
+ density: 0.55
88
+ weight: 0.15
89
+
90
+ - model: asiansoul/Llama-3-Open-Ko-Linear-8B
91
+ parameters:
92
+ density: 0.55
93
+ weight: 0.1
94
+
95
+ - model: maum-ai/Llama-3-MAAL-8B-Instruct-v0.1
96
+ parameters:
97
+ density: 0.55
98
+ weight: 0.1
99
+
100
+ - model: asiansoul/Joah-Llama-3-KoEn-8B-Coder-v1
101
+ parameters:
102
+ density: 0.55
103
+ weight: 0.2
104
+
105
+ - model: ajibawa-2023/Code-Llama-3-8B
106
+ parameters:
107
+ density: 0.55
108
+ weight: 0.05
109
+
110
+ - model: defog/llama-3-sqlcoder-8b
111
+ parameters:
112
+ density: 0.55
113
+ weight: 0.1
114
+
115
+ - model: Locutusque/llama-3-neural-chat-v2.2-8B
116
+ parameters:
117
+ density: 0.55
118
+ weight: 0.1
119
+
120
+ - model: NousResearch/Hermes-2-Pro-Llama-3-8B
121
+ parameters:
122
+ density: 0.55
123
+ weight: 0.05
124
+
125
+ merge_method: dare_ties
126
+ base_model: NousResearch/Meta-Llama-3-8B
127
+ parameters:
128
+ int8_mask: true
129
+ dtype: bfloat16
130
+
131
+
132
+ ```