ibivibiv commited on
Commit
58b3924
1 Parent(s): 18ac4d1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -0
README.md ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ ---
7
+
8
+ # Aegolius Acadicus 34b v3
9
+
10
+ ![img](./aegolius-acadicus.png)
11
+
12
+ I like to call this model "The little professor". It is simply a MOE merge of lora merged models across Llama2 and Mistral. I am using this as a test case to move to larger models and get my gate discrimination set correctly. This model is best suited for knowledge related use cases, I did not give it a specific workload target as I did with some of the other models in the "Owl Series".
13
+
14
+ In this particular run I am expanding data sets and model count to see if that helps/hurts. I am also moving to more of my own fine tuned mistrals
15
+
16
+ I am paying for the fine tunes on runpod myself on these and then merging to larger models to allow them to load as a single model. Soon I hope to be using entirely models that I have fine tuned myself.
17
+
18
+ This model is merged from the following sources:
19
+
20
+ [Fine Tuned Mistral of Mine](https://huggingface.co/ibivibiv/temp_tuned_mistral2)
21
+ [Fine Tuned Mistral of Mine](https://huggingface.co/ibivibiv/temp_tuned_mistral3)
22
+ [WestLake-7B-v2-laser-truthy-dpo](https://huggingface.co/macadeliccc/WestLake-7B-v2-laser-truthy-dpo)
23
+ [flux-7b-v0.1](https://huggingface.co/chanwit/flux-7b-v0.1)
24
+ [senseable/WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2)
25
+ [WestSeverus-7B-DPO](https://huggingface.co/PetroGPT/WestSeverus-7B-DPO)
26
+
27
+ Unless those models are "contaminated" this one is not. This is a proof of concept version of this series and you can find others where I am tuning my own models and using moe mergekit to combine them to make moe models that I can run on lower tier hardware with better results.
28
+
29
+ The goal here is to create specialized models that can collaborate and run as one model.
30
+
31
+ # Prompting
32
+
33
+ ## Prompt Template for alpaca style
34
+
35
+ ```
36
+ ### Instruction:
37
+
38
+ <prompt> (without the <>)
39
+
40
+ ### Response:
41
+ ```
42
+
43
+ ## Sample Code
44
+
45
+ ```python
46
+ import torch
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+
49
+ torch.set_default_device("cuda")
50
+
51
+ model = AutoModelForCausalLM.from_pretrained("ibivibiv/aegolius-acadicus-24b-v2", torch_dtype="auto", device_config='auto')
52
+ tokenizer = AutoTokenizer.from_pretrained("ibivibiv/aegolius-acadicus-24b-v2")
53
+
54
+ inputs = tokenizer("### Instruction: Who would when in an arm wrestling match between Abraham Lincoln and Chuck Norris?\n### Response:\n", return_tensors="pt", return_attention_mask=False)
55
+
56
+ outputs = model.generate(**inputs, max_length=200)
57
+ text = tokenizer.batch_decode(outputs)[0]
58
+ print(text)
59
+ ```
60
+
61
+ # Model Details
62
+ * **Trained by**: [ibivibiv](https://huggingface.co/ibivibiv)
63
+ * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
64
+ * **Model type:** **aegolius-acadicus-24b-v2** is an auto-regressive language model moe from Llama 2 transformer architecture models and mistral models.
65
+ * **Language(s)**: English
66
+ * **Purpose**: This model is an attempt at an moe model to cover multiple disciplines using finetuned llama 2 and mistral models as base models.
67
+
68
+ # Benchmark Scores
69
+
70
+ coming soon
71
+
72
+ ## Citations
73
+
74
+ ```
75
+ @misc{open-llm-leaderboard,
76
+ author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
77
+ title = {Open LLM Leaderboard},
78
+ year = {2023},
79
+ publisher = {Hugging Face},
80
+ howpublished = "\url{https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}"
81
+ }
82
+ ```
83
+ ```
84
+ @software{eval-harness,
85
+ author = {Gao, Leo and
86
+ Tow, Jonathan and
87
+ Biderman, Stella and
88
+ Black, Sid and
89
+ DiPofi, Anthony and
90
+ Foster, Charles and
91
+ Golding, Laurence and
92
+ Hsu, Jeffrey and
93
+ McDonell, Kyle and
94
+ Muennighoff, Niklas and
95
+ Phang, Jason and
96
+ Reynolds, Laria and
97
+ Tang, Eric and
98
+ Thite, Anish and
99
+ Wang, Ben and
100
+ Wang, Kevin and
101
+ Zou, Andy},
102
+ title = {A framework for few-shot language model evaluation},
103
+ month = sep,
104
+ year = 2021,
105
+ publisher = {Zenodo},
106
+ version = {v0.0.1},
107
+ doi = {10.5281/zenodo.5371628},
108
+ url = {https://doi.org/10.5281/zenodo.5371628}
109
+ }
110
+ ```
111
+ ```
112
+ @misc{clark2018think,
113
+ title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
114
+ author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
115
+ year={2018},
116
+ eprint={1803.05457},
117
+ archivePrefix={arXiv},
118
+ primaryClass={cs.AI}
119
+ }
120
+ ```
121
+ ```
122
+ @misc{zellers2019hellaswag,
123
+ title={HellaSwag: Can a Machine Really Finish Your Sentence?},
124
+ author={Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi},
125
+ year={2019},
126
+ eprint={1905.07830},
127
+ archivePrefix={arXiv},
128
+ primaryClass={cs.CL}
129
+ }
130
+ ```
131
+ ```
132
+ @misc{hendrycks2021measuring,
133
+ title={Measuring Massive Multitask Language Understanding},
134
+ author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
135
+ year={2021},
136
+ eprint={2009.03300},
137
+ archivePrefix={arXiv},
138
+ primaryClass={cs.CY}
139
+ }
140
+ ```
141
+ ```
142
+ @misc{lin2022truthfulqa,
143
+ title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
144
+ author={Stephanie Lin and Jacob Hilton and Owain Evans},
145
+ year={2022},
146
+ eprint={2109.07958},
147
+ archivePrefix={arXiv},
148
+ primaryClass={cs.CL}
149
+ }
150
+ ```
151
+ ```
152
+ @misc{DBLP:journals/corr/abs-1907-10641,
153
+ title={{WINOGRANDE:} An Adversarial Winograd Schema Challenge at Scale},
154
+ author={Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi},
155
+ year={2019},
156
+ eprint={1907.10641},
157
+ archivePrefix={arXiv},
158
+ primaryClass={cs.CL}
159
+ }
160
+ ```
161
+ ```
162
+ @misc{DBLP:journals/corr/abs-2110-14168,
163
+ title={Training Verifiers to Solve Math Word Problems},
164
+ author={Karl Cobbe and
165
+ Vineet Kosaraju and
166
+ Mohammad Bavarian and
167
+ Mark Chen and
168
+ Heewoo Jun and
169
+ Lukasz Kaiser and
170
+ Matthias Plappert and
171
+ Jerry Tworek and
172
+ Jacob Hilton and
173
+ Reiichiro Nakano and
174
+ Christopher Hesse and
175
+ John Schulman},
176
+ year={2021},
177
+ eprint={2110.14168},
178
+ archivePrefix={arXiv},
179
+ primaryClass={cs.CL}
180
+ }
181
+ ```