ibivibiv commited on
Commit
f84f293
1 Parent(s): 58570f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -1
README.md CHANGED
@@ -5,4 +5,163 @@ language:
5
  library_name: transformers
6
  tags:
7
  - moe
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  library_name: transformers
6
  tags:
7
  - moe
8
+ ---
9
+
10
+ # Aegolius Acadicus 24B V2
11
+
12
+ ![img](./aegolius-acadicus.png)
13
+
14
+ I like to call this model line "The little professor". They are MOE merges of 7B fine tuned models to cover general knowledge use cases.
15
+
16
+ # Prompting
17
+
18
+ ## Prompt Template for alpaca style
19
+
20
+ ```
21
+ ### Instruction:
22
+
23
+ <prompt> (without the <>)
24
+
25
+ ### Response:
26
+ ```
27
+
28
+ ## Sample Code
29
+
30
+ ```python
31
+ import torch
32
+ from transformers import AutoModelForCausalLM, AutoTokenizer
33
+
34
+ torch.set_default_device("cuda")
35
+
36
+ model = AutoModelForCausalLM.from_pretrained("ibivibiv/aegolius-acadicus-30b", torch_dtype="auto", device_config='auto')
37
+ tokenizer = AutoTokenizer.from_pretrained("ibivibiv/aegolius-acadicus-30b")
38
+
39
+ inputs = tokenizer("### Instruction: Who would when in an arm wrestling match between Abraham Lincoln and Chuck Norris?\n### Response:\n", return_tensors="pt", return_attention_mask=False)
40
+
41
+ outputs = model.generate(**inputs, max_length=200)
42
+ text = tokenizer.batch_decode(outputs)[0]
43
+ print(text)
44
+ ```
45
+
46
+ # Model Details
47
+ * **Trained by**: [ibivibiv](https://huggingface.co/ibivibiv)
48
+ * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
49
+ * **Model type:** **aegolius-acadicus-24b-v2** is an auto-regressive language model moe from Llama 2 transformer architecture models and mistral models.
50
+ * **Language(s)**: English
51
+ * **Purpose**: This model is an iteration of an moe model (the original Aegolius Acadicus) to lower the model size and maintain capabilities.
52
+
53
+ # Benchmark Scores
54
+
55
+ pending
56
+
57
+
58
+ ## Citations
59
+
60
+ ```
61
+ @misc{open-llm-leaderboard,
62
+ author = {Edward Beeching and Clémentine Fourrier and Nathan Habib and Sheon Han and Nathan Lambert and Nazneen Rajani and Omar Sanseviero and Lewis Tunstall and Thomas Wolf},
63
+ title = {Open LLM Leaderboard},
64
+ year = {2023},
65
+ publisher = {Hugging Face},
66
+ howpublished = "\url{https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard}"
67
+ }
68
+ ```
69
+ ```
70
+ @software{eval-harness,
71
+ author = {Gao, Leo and
72
+ Tow, Jonathan and
73
+ Biderman, Stella and
74
+ Black, Sid and
75
+ DiPofi, Anthony and
76
+ Foster, Charles and
77
+ Golding, Laurence and
78
+ Hsu, Jeffrey and
79
+ McDonell, Kyle and
80
+ Muennighoff, Niklas and
81
+ Phang, Jason and
82
+ Reynolds, Laria and
83
+ Tang, Eric and
84
+ Thite, Anish and
85
+ Wang, Ben and
86
+ Wang, Kevin and
87
+ Zou, Andy},
88
+ title = {A framework for few-shot language model evaluation},
89
+ month = sep,
90
+ year = 2021,
91
+ publisher = {Zenodo},
92
+ version = {v0.0.1},
93
+ doi = {10.5281/zenodo.5371628},
94
+ url = {https://doi.org/10.5281/zenodo.5371628}
95
+ }
96
+ ```
97
+ ```
98
+ @misc{clark2018think,
99
+ title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
100
+ author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
101
+ year={2018},
102
+ eprint={1803.05457},
103
+ archivePrefix={arXiv},
104
+ primaryClass={cs.AI}
105
+ }
106
+ ```
107
+ ```
108
+ @misc{zellers2019hellaswag,
109
+ title={HellaSwag: Can a Machine Really Finish Your Sentence?},
110
+ author={Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi},
111
+ year={2019},
112
+ eprint={1905.07830},
113
+ archivePrefix={arXiv},
114
+ primaryClass={cs.CL}
115
+ }
116
+ ```
117
+ ```
118
+ @misc{hendrycks2021measuring,
119
+ title={Measuring Massive Multitask Language Understanding},
120
+ author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
121
+ year={2021},
122
+ eprint={2009.03300},
123
+ archivePrefix={arXiv},
124
+ primaryClass={cs.CY}
125
+ }
126
+ ```
127
+ ```
128
+ @misc{lin2022truthfulqa,
129
+ title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
130
+ author={Stephanie Lin and Jacob Hilton and Owain Evans},
131
+ year={2022},
132
+ eprint={2109.07958},
133
+ archivePrefix={arXiv},
134
+ primaryClass={cs.CL}
135
+ }
136
+ ```
137
+ ```
138
+ @misc{DBLP:journals/corr/abs-1907-10641,
139
+ title={{WINOGRANDE:} An Adversarial Winograd Schema Challenge at Scale},
140
+ author={Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi},
141
+ year={2019},
142
+ eprint={1907.10641},
143
+ archivePrefix={arXiv},
144
+ primaryClass={cs.CL}
145
+ }
146
+ ```
147
+ ```
148
+ @misc{DBLP:journals/corr/abs-2110-14168,
149
+ title={Training Verifiers to Solve Math Word Problems},
150
+ author={Karl Cobbe and
151
+ Vineet Kosaraju and
152
+ Mohammad Bavarian and
153
+ Mark Chen and
154
+ Heewoo Jun and
155
+ Lukasz Kaiser and
156
+ Matthias Plappert and
157
+ Jerry Tworek and
158
+ Jacob Hilton and
159
+ Reiichiro Nakano and
160
+ Christopher Hesse and
161
+ John Schulman},
162
+ year={2021},
163
+ eprint={2110.14168},
164
+ archivePrefix={arXiv},
165
+ primaryClass={cs.CL}
166
+ }
167
+ ```