RichardErkhov commited on
Commit
6ee541c
1 Parent(s): 65ea130

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +209 -0
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ phi-2-orange-v2 - bnb 8bits
11
+ - Model creator: https://huggingface.co/rhysjones/
12
+ - Original model: https://huggingface.co/rhysjones/phi-2-orange-v2/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: mit
20
+ datasets:
21
+ - Open-Orca/SlimOrca-Dedup
22
+ - migtissera/Synthia-v1.3
23
+ - LDJnr/Verified-Camel
24
+ - LDJnr/Pure-Dove
25
+ - LDJnr/Capybara
26
+ - meta-math/MetaMathQA
27
+ - Intel/orca_dpo_pairs
28
+ - argilla/ultrafeedback-binarized-preferences-cleaned
29
+ widget:
30
+ - example_title: "Example interaction"
31
+ text: "Why is the sky blue?"
32
+ inference:
33
+ parameters:
34
+ do_sample: True
35
+ temperature: 0.1
36
+ model-index:
37
+ - name: phi-2-orange-v2
38
+ results:
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: AI2 Reasoning Challenge (25-Shot)
44
+ type: ai2_arc
45
+ config: ARC-Challenge
46
+ split: test
47
+ args:
48
+ num_few_shot: 25
49
+ metrics:
50
+ - type: acc_norm
51
+ value: 61.86
52
+ name: normalized accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rhysjones/phi-2-orange-v2
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: HellaSwag (10-Shot)
61
+ type: hellaswag
62
+ split: validation
63
+ args:
64
+ num_few_shot: 10
65
+ metrics:
66
+ - type: acc_norm
67
+ value: 76.32
68
+ name: normalized accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rhysjones/phi-2-orange-v2
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: MMLU (5-Shot)
77
+ type: cais/mmlu
78
+ config: all
79
+ split: test
80
+ args:
81
+ num_few_shot: 5
82
+ metrics:
83
+ - type: acc
84
+ value: 55.72
85
+ name: accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rhysjones/phi-2-orange-v2
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: TruthfulQA (0-shot)
94
+ type: truthful_qa
95
+ config: multiple_choice
96
+ split: validation
97
+ args:
98
+ num_few_shot: 0
99
+ metrics:
100
+ - type: mc2
101
+ value: 54.84
102
+ source:
103
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rhysjones/phi-2-orange-v2
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: Winogrande (5-shot)
110
+ type: winogrande
111
+ config: winogrande_xl
112
+ split: validation
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 75.69
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rhysjones/phi-2-orange-v2
121
+ name: Open LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: GSM8k (5-shot)
127
+ type: gsm8k
128
+ config: main
129
+ split: test
130
+ args:
131
+ num_few_shot: 5
132
+ metrics:
133
+ - type: acc
134
+ value: 57.62
135
+ name: accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=rhysjones/phi-2-orange-v2
138
+ name: Open LLM Leaderboard
139
+ ---
140
+ ![Phi-2 Orange](https://huggingface.co/rhysjones/phi-2-orange-v2/resolve/main/phi-2-orange.jpg)
141
+
142
+ # Phi-2 Orange Version 2
143
+
144
+ A two-step finetune of Phi-2, with a bit more zest.
145
+
146
+ This is an improved version of the original [Phi-2-Orange](https://huggingface.co/rhysjones/phi-2-orange) that
147
+ uses an updated training process on the same datasets.
148
+
149
+ It also uses the latest updated model from Microsoft's [Phi-2](https://huggingface.co/microsoft/phi-2), making it directly usable
150
+ within Hugging Face's Transformers library (without the need for trust remote code).
151
+
152
+ # Prompt Format
153
+
154
+ Phi-2 Orange v2 uses ChatML as the prompt format.
155
+ (Update 12th March 2024: fixed eos_token issue)
156
+
157
+ It's recommended to always prompt with a system instruction (use whatever system prompt you like):
158
+
159
+ ```
160
+ <|im_start|>system
161
+ You are a helpful assistant for Python which outputs in Markdown format.<|im_end|>
162
+ <|im_start|>user
163
+ Write a function to calculate the Fibonacci sequence<|im_end|>
164
+ <|im_start|>assistant
165
+
166
+ ```
167
+
168
+ For example, if you find the model's output to be overly verbose, instruct it to be short and concise:
169
+
170
+ ```
171
+ <|im_start|>system
172
+ You are a helpful assistant. Be short and direct in your answers.<|im_end|>
173
+ <|im_start|>user
174
+ Was Tom Hanks in the movie Forrest Gump? If so, who did he play and give details of the plot.<|im_end|>
175
+ <|im_start|>assistant
176
+ ```
177
+
178
+ # Evaluations
179
+
180
+
181
+ [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
182
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_rhysjones__phi-2-orange-v2)
183
+ | Metric |Value|
184
+ |---------------------------------|----:|
185
+ |Average |63.67|
186
+ |AI2 Reasoning Challenge (25-Shot)|61.86|
187
+ |HellaSwag (10-Shot) |76.32|
188
+ |MMLU (5-Shot) |55.72|
189
+ |TruthfulQA (0-shot) |54.84|
190
+ |Winogrande (5-shot) |75.69|
191
+ |GSM8k (5-shot) |57.62|
192
+
193
+ [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
194
+ Evaluation from [mlabonne](https://huggingface.co/mlabonne)'s alternative LLM leaderboard:
195
+ | Metric |Value|
196
+ |---------------------------------|----:|
197
+ |Average |49.64|
198
+ |AGIEval |34.55|
199
+ |GPT4All |70.96|
200
+ |TruthfulQA |54.87|
201
+ |Bigbench |38.17|
202
+
203
+ # Limitations
204
+
205
+ This model shares the same limitations as the underlying Phi-2 model, details of which are found [here](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2).
206
+
207
+
208
+
209
+