Update README.md
Browse files
README.md
CHANGED
@@ -9,9 +9,24 @@ tags:
|
|
9 |
|
10 |
# Aegolius Acadicus 24B V2
|
11 |
|
|
|
|
|
12 |
![img](./aegolius-acadicus.png)
|
13 |
|
14 |
-
I like to call this model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
# Prompting
|
17 |
|
@@ -48,12 +63,11 @@ print(text)
|
|
48 |
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
|
49 |
* **Model type:** **aegolius-acadicus-24b-v2** is an auto-regressive language model moe from Llama 2 transformer architecture models and mistral models.
|
50 |
* **Language(s)**: English
|
51 |
-
* **Purpose**: This model is an
|
52 |
|
53 |
# Benchmark Scores
|
54 |
|
55 |
-
|
56 |
-
|
57 |
|
58 |
## Citations
|
59 |
|
|
|
9 |
|
10 |
# Aegolius Acadicus 24B V2
|
11 |
|
12 |
+
# Aegolius Acadicus 30B
|
13 |
+
|
14 |
![img](./aegolius-acadicus.png)
|
15 |
|
16 |
+
I like to call this model "The little professor". It is simply a MOE merge of lora merged models across Llama2 and Mistral. I am using this as a test case to move to larger models and get my gate discrimination set correctly. This model is best suited for knowledge related use cases, I did not give it a specific workload target as I did with some of the other models in the "Owl Series".
|
17 |
+
|
18 |
+
In this particular run I am starting to collapse data sets and model count to see if that helps/hurts
|
19 |
+
|
20 |
+
This model is merged from the following sources:
|
21 |
+
|
22 |
+
[Fine Tuned Mistral of Mine](https://huggingface.co/ibivibiv/temp_tuned_mistral)
|
23 |
+
[WestLake-7B-v2-laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser)
|
24 |
+
[openchat-nectar-0.5](https://huggingface.co/andysalerno/openchat-nectar-0.5)
|
25 |
+
[WestSeverus-7B-DPO](https://huggingface.co/PetroGPT/WestSeverus-7B-DPO)
|
26 |
+
|
27 |
+
Unless those models are "contaminated" this one is not. This is a proof of concept version of this series and you can find others where I am tuning my own models and using moe mergekit to combine them to make moe models that I can run on lower tier hardware with better results.
|
28 |
+
|
29 |
+
The goal here is to create specialized models that can collaborate and run as one model.
|
30 |
|
31 |
# Prompting
|
32 |
|
|
|
63 |
* **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
|
64 |
* **Model type:** **aegolius-acadicus-24b-v2** is an auto-regressive language model moe from Llama 2 transformer architecture models and mistral models.
|
65 |
* **Language(s)**: English
|
66 |
+
* **Purpose**: This model is an attempt at an moe model to cover multiple disciplines using finetuned llama 2 and mistral models as base models.
|
67 |
|
68 |
# Benchmark Scores
|
69 |
|
70 |
+
coming soon
|
|
|
71 |
|
72 |
## Citations
|
73 |
|