File size: 6,863 Bytes
19b2306
 
 
 
5b115fb
 
 
a02a683
 
 
 
 
 
 
44d94be
b036cc1
a02a683
 
 
 
 
 
 
 
 
 
19b2306
 
a02a683
19b2306
 
 
 
a02a683
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e217ef3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a02a683
19b2306
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b115fb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
base_model:
- LeroyDyer/Mixtral_AI_Cyber_MegaMind
- LeroyDyer/Mixtral_AI_Cyber_MegaMind_2_0
license: apache-2.0
language:
- en
library_name: transformers
tags:
- chemistry
- biology
- code
- medical
- not-for-all-audiences
- Cyber-Series
- Mixture-Of-Experts
metrics:
- accuracy
- bertscore
- bleu
- brier_score
- code_eval
- chrf
- charcut_mt
- character
- cer
---

# MEGA_MIND 24b CyberSeries: Revolutionizing Language Models

### Models Merged

The following models were included in the merge:
* [LeroyDyer/Mixtral_AI_Cyber_MegaMind](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_MegaMind)  24bMoE
* [LeroyDyer/Mixtral_AI_Cyber_MegaMind_2_0](https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_MegaMind_2_0) 24bMoE


## CyberSeries


The MEGA_MIND 24b CyberSeries represents a groundbreaking leap in the realm of language models, integrating a diverse array of expert models into a unified framework. At its core lies the Mistral-7B-Instruct-v0.2, a refined instructional model designed for versatility and efficiency.

Enhanced with an expanded context window and advanced routing mechanisms, the Mistral-7B-Instruct-v0.2 exemplifies the power of Mixture of Experts, allowing seamless integration of specialized sub-models. This architecture facilitates unparalleled performance and scalability, enabling the CyberSeries to tackle a myriad of tasks with unparalleled speed and accuracy.

Among its illustrious sub-models, the OpenOrca - Mistral-7B-8k shines as a testament to fine-tuning excellence, boasting top-ranking performance in its class. Meanwhile, the Hermes 2 Pro introduces cutting-edge capabilities such as Function Calling and JSON Mode, catering to diverse application needs.

Driven by Reinforcement Learning from AI Feedback, the Starling-LM-7B-beta demonstrates remarkable adaptability and optimization, while the Phi-1.5 Transformer model stands as a beacon of excellence across various domains, from common sense reasoning to medical inference.

With models like BioMistral tailored specifically for medical applications and Nous-Yarn-Mistral-7b-128k excelling in handling long-context data, the MEGA_MIND 24b CyberSeries emerges as a transformative force in the landscape of language understanding and artificial intelligence.

Experience the future of language models with the MEGA_MIND 24b CyberSeries, where innovation meets performance, and possibilities are limitless.

## Objective: Creating a Blend of Experts!
The concept of Mixture of Experts is central to this endeavor. By employing Mixture of Experts, models can be pre-trained with significantly less computational resources. This means the model or dataset size can be dramatically increased without requiring additional compute resources, compared to traditional dense models. A Mixture of Experts model is expected to achieve comparable quality to its dense counterpart, but much faster during pretraining. A key component of this approach is the gate network or router, which determines which tokens are assigned to which expert. This routing mechanism, consisting of learned parameters, is trained simultaneously with the rest of the network.

## Core Model: Mistral-7B-Instruct-v0.2
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is a refined version of the Mistral-7B-v0.2, tailored for instructional purposes.

Compared to Mistral-7B-v0.1, Mistral-7B-v0.2 features the following enhancements:

Expanded context window to 32k (from 8k in v0.1)
Rope-theta set to 1e6
Elimination of Sliding-Window Attention
## SUB MODELS - POPULAR ONES 
OpenOrca - Mistral - 7B - 8k, utilized the OpenOrca dataset for fine-tuning on top of Mistral 7B, achieving top-ranking performance among models smaller than 30B upon release.

Hermes 2 Pro is an upgraded version of Nous Hermes 2, leveraging an updated dataset and introducing new capabilities such as Function Calling and JSON Mode.

Starling-LM-7B-beta, trained through Reinforcement Learning from AI Feedback, demonstrates improved performance by incorporating enhanced reward models and policy optimization methods.

Phi-1.5, a Transformer model with 1.3 billion parameters, exhibits exceptional performance across various benchmarks, including common sense, language understanding, and logical reasoning tasks.

BioMistral: A suite of Mistral-based models, specifically tailored for medical domains, pre-trained using textual data from PubMed Central Open Access.

Nous-Yarn-Mistral-7b-128k, an extension of Mistral-7B-v0.1, supports a 128k token context window and excels in handling long-context data.

Core Model:
Mistral-7B-Instruct-v0.2
Large Language Model tailored for instructional purposes.
Enhanced features include an expanded context window to 32k and the elimination of sliding-window attention.
Utilizes the Mixture of Experts approach for efficient pre-training.
Sub Models:
OpenOrca - Mistral-7B-8k

Fine-tuned using the OpenOrca dataset on top of Mistral 7B.
Achieved top-ranking performance among models smaller than 30B upon release.
Hermes 2 Pro

Upgraded version of Nous Hermes 2.
Introduces new capabilities such as Function Calling and JSON Mode.
Starling-LM-7B-beta

Trained through Reinforcement Learning from AI Feedback.
Demonstrates improved performance through enhanced reward models and policy optimization.
Phi-1.5

Transformer model with 1.3 billion parameters.
Exceptional performance across various benchmarks including common sense, language understanding, and logical reasoning tasks.
BioMistral

Suite of Mistral-based models tailored for medical domains.
Pre-trained using textual data from PubMed Central Open Access.
Nous-Yarn-Mistral-7b-128k

Extension of Mistral-7B-v0.1 supporting a 128k token context window.
Excels in handling long-context data.

## Extended capabilities:
  * mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base

  * ChaoticNeutrals/Eris-LelantaclesV2-7b - role play
 
  * ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision

  * rvv-karma/BASH-Coder-Mistral-7B - coding

  * Locutusque/Hercules-3.1-Mistral-7B - Unhinging

  * KoboldAI/Mistral-7B-Erebus-v3 - NSFW

  * Locutusque/Hyperion-2.1-Mistral-7B - CHAT

  * Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking

  * NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing
 
  * mistralai/Mistral-7B-Instruct-v0.2 - BASE

  * Nitral-AI/ProdigyXBioMistral_7B - medical

  * Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement

  * Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion
 
  * yanismiraoui/Yarn-Mistral-7b-128k-sharded

  * ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay


### Configuration

The following YAML configuration was used to produce this model:

```yaml

models:
  - model: LeroyDyer/Mixtral_AI_Cyber_MegaMind
    parameters:
      weight: 0.512
  - model: LeroyDyer/Mixtral_AI_Cyber_MegaMind_2_0
    parameters:
      weight: 0.512
merge_method: linear
dtype: float16

```