Update README.md
Browse files
README.md
CHANGED
@@ -14,73 +14,7 @@ license: apache-2.0
|
|
14 |
language:
|
15 |
- en
|
16 |
---
|
17 |
-
# merge
|
18 |
|
19 |
-
This is a merge of pre-trained language models
|
20 |
|
21 |
-
|
22 |
-
Merging the main modules or past models actually become very important COnsolidating the internal predictive nature of the network.
|
23 |
-
With each model experienceing a different set of fine tuning and adjustment to its weights network. keeping the models at the same "Size is also important. the tensors actually do not change in size they just adjust as the numerical value take the same space in memory and on disk.
|
24 |
-
with this model it uses the mistral transformer network ; actually the instruct base.
|
25 |
-
The models (Comercial Orca and dolphn as well as Nous etc Starling) are actually contamnated Models as after you may find some questions refused as it has already been placed in the dataset!... as well as some biasing... hence these model despite reponding great they are biased towrds the makers peronal psycometric understanding of the World . infact the fine tuning process is just to push the llm on to the new shape of the new types of questions or tasks being set!
|
26 |
-
Noticabley misstrainign can leave erronious text in your output...
|
27 |
-
|
28 |
-
Again using tis as a base the next round of tuning will be task specific ! as i beleive i have merged the main bulk of common models into this model !
|
29 |
-
"Other model may be o=more stable ! we shall see ! any comments welcome ..
|
30 |
-
|
31 |
-
## Merge Details
|
32 |
-
### Merge Method
|
33 |
-
|
34 |
-
This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [LeroyDyer/Mixtral_AI_Cyber_3.m2](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_3.m2) as a base.
|
35 |
-
|
36 |
-
### Models Merged
|
37 |
-
|
38 |
-
The following models were included in the merge:
|
39 |
-
* [LeroyDyer/Mixtral_AI_Cyber_Orca](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_Orca)
|
40 |
-
* [LeroyDyer/Mixtral_AI_Cyber_4.0](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_4.0)
|
41 |
-
* [LeroyDyer/Mixtral_AI_Cyber_4.0_m1](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_4.0_m1)
|
42 |
-
* [LeroyDyer/Mixtral_AI_Cyber_Dolphin](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_Dolphin)
|
43 |
-
* [LeroyDyer/Mixtral_AI_Cyber_4_m1_SFT](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_4_m1_SFT)
|
44 |
-
|
45 |
-
### Configuration
|
46 |
-
|
47 |
-
The following YAML configuration was used to produce this model:
|
48 |
-
|
49 |
-
```yaml
|
50 |
-
|
51 |
-
models:
|
52 |
-
- model: LeroyDyer/Mixtral_AI_Cyber_3.m2
|
53 |
-
parameters:
|
54 |
-
density: [0.256, 0.512, 0.128] # density gradient
|
55 |
-
weight: 0.382
|
56 |
-
- model: LeroyDyer/Mixtral_AI_Cyber_Orca
|
57 |
-
parameters:
|
58 |
-
density: 0.382
|
59 |
-
weight: [0.256, 0.128, 0.256, 0.128] # weight gradient
|
60 |
-
- model: LeroyDyer/Mixtral_AI_Cyber_Dolphin
|
61 |
-
parameters:
|
62 |
-
density: 0.382
|
63 |
-
weight: [0.128, 0.512, 0.128, 0.128] # weight gradient
|
64 |
-
- model: LeroyDyer/Mixtral_AI_Cyber_4.0_m1
|
65 |
-
parameters:
|
66 |
-
density: 0.382
|
67 |
-
weight: [0.256, 0.256, 0.512, 0.128] # weight gradient
|
68 |
-
- model: LeroyDyer/Mixtral_AI_Cyber_4_m1_SFT
|
69 |
-
parameters:
|
70 |
-
density: 0.382
|
71 |
-
weight: [0.128, 0.512, 0.128, 0.128] # weight gradient
|
72 |
-
- model: LeroyDyer/Mixtral_AI_Cyber_4.0
|
73 |
-
parameters:
|
74 |
-
density: 0.382
|
75 |
-
weight:
|
76 |
-
- filter: mlp
|
77 |
-
value: 0.5
|
78 |
-
- value: 0
|
79 |
-
merge_method: ties
|
80 |
-
base_model: LeroyDyer/Mixtral_AI_Cyber_3.m2
|
81 |
-
parameters:
|
82 |
-
normalize: true
|
83 |
-
int8_mask: true
|
84 |
-
dtype: float16
|
85 |
-
|
86 |
-
```
|
|
|
14 |
language:
|
15 |
- en
|
16 |
---
|
|
|
17 |
|
18 |
+
This summary describes the latest language model (LLM), which is a merge of pre-trained language models using MergeKit.
|
19 |
|
20 |
+
Merging these models is crucial for consolidating the internal predictive nature of the network. Each model undergoes different fine-tuning and adjustment to its weights, maintaining consistent size across models is essential. Despite using the Mistral transformer network as the base, it's worth noting that the merged models (Commercial Orca, Dolphin, Nous, Starling, etc.) may exhibit contamination, leading to some questions being already present in the dataset and potential biases towards the creator's personal psychometric understanding of the world. Fine-tuning aims to adapt the LLM to new types of questions or tasks, but misalignment during this process can result in erroneous text outputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|