LeroyDyer commited on
Commit
fca6229
1 Parent(s): 5b5a381

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -68
README.md CHANGED
@@ -14,73 +14,7 @@ license: apache-2.0
14
  language:
15
  - en
16
  ---
17
- # merge
18
 
19
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
20
 
21
-
22
- Merging the main modules or past models actually become very important COnsolidating the internal predictive nature of the network.
23
- With each model experienceing a different set of fine tuning and adjustment to its weights network. keeping the models at the same "Size is also important. the tensors actually do not change in size they just adjust as the numerical value take the same space in memory and on disk.
24
- with this model it uses the mistral transformer network ; actually the instruct base.
25
- The models (Comercial Orca and dolphn as well as Nous etc Starling) are actually contamnated Models as after you may find some questions refused as it has already been placed in the dataset!... as well as some biasing... hence these model despite reponding great they are biased towrds the makers peronal psycometric understanding of the World . infact the fine tuning process is just to push the llm on to the new shape of the new types of questions or tasks being set!
26
- Noticabley misstrainign can leave erronious text in your output...
27
-
28
- Again using tis as a base the next round of tuning will be task specific ! as i beleive i have merged the main bulk of common models into this model !
29
- "Other model may be o=more stable ! we shall see ! any comments welcome ..
30
-
31
- ## Merge Details
32
- ### Merge Method
33
-
34
- This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [LeroyDyer/Mixtral_AI_Cyber_3.m2](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_3.m2) as a base.
35
-
36
- ### Models Merged
37
-
38
- The following models were included in the merge:
39
- * [LeroyDyer/Mixtral_AI_Cyber_Orca](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_Orca)
40
- * [LeroyDyer/Mixtral_AI_Cyber_4.0](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_4.0)
41
- * [LeroyDyer/Mixtral_AI_Cyber_4.0_m1](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_4.0_m1)
42
- * [LeroyDyer/Mixtral_AI_Cyber_Dolphin](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_Dolphin)
43
- * [LeroyDyer/Mixtral_AI_Cyber_4_m1_SFT](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_4_m1_SFT)
44
-
45
- ### Configuration
46
-
47
- The following YAML configuration was used to produce this model:
48
-
49
- ```yaml
50
-
51
- models:
52
- - model: LeroyDyer/Mixtral_AI_Cyber_3.m2
53
- parameters:
54
- density: [0.256, 0.512, 0.128] # density gradient
55
- weight: 0.382
56
- - model: LeroyDyer/Mixtral_AI_Cyber_Orca
57
- parameters:
58
- density: 0.382
59
- weight: [0.256, 0.128, 0.256, 0.128] # weight gradient
60
- - model: LeroyDyer/Mixtral_AI_Cyber_Dolphin
61
- parameters:
62
- density: 0.382
63
- weight: [0.128, 0.512, 0.128, 0.128] # weight gradient
64
- - model: LeroyDyer/Mixtral_AI_Cyber_4.0_m1
65
- parameters:
66
- density: 0.382
67
- weight: [0.256, 0.256, 0.512, 0.128] # weight gradient
68
- - model: LeroyDyer/Mixtral_AI_Cyber_4_m1_SFT
69
- parameters:
70
- density: 0.382
71
- weight: [0.128, 0.512, 0.128, 0.128] # weight gradient
72
- - model: LeroyDyer/Mixtral_AI_Cyber_4.0
73
- parameters:
74
- density: 0.382
75
- weight:
76
- - filter: mlp
77
- value: 0.5
78
- - value: 0
79
- merge_method: ties
80
- base_model: LeroyDyer/Mixtral_AI_Cyber_3.m2
81
- parameters:
82
- normalize: true
83
- int8_mask: true
84
- dtype: float16
85
-
86
- ```
 
14
  language:
15
  - en
16
  ---
 
17
 
18
+ This summary describes the latest language model (LLM), which is a merge of pre-trained language models using MergeKit.
19
 
20
+ Merging these models is crucial for consolidating the internal predictive nature of the network. Each model undergoes different fine-tuning and adjustment to its weights, maintaining consistent size across models is essential. Despite using the Mistral transformer network as the base, it's worth noting that the merged models (Commercial Orca, Dolphin, Nous, Starling, etc.) may exhibit contamination, leading to some questions being already present in the dataset and potential biases towards the creator's personal psychometric understanding of the world. Fine-tuning aims to adapt the LLM to new types of questions or tasks, but misalignment during this process can result in erroneous text outputs.