bartowski fblgit commited on
Commit
ff7e5bc
1 Parent(s): 1000795

Update README.md (#1)

Browse files

- Update README.md (bf54899a81e9b238292887279ced88a59f36431e)


Co-authored-by: FBL <fblgit@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +215 -0
README.md CHANGED
@@ -10,8 +10,131 @@ datasets:
10
  - mlabonne/orpo-dpo-mix-40k
11
  quantized_by: bartowski
12
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## Llamacpp imatrix Quantizations of UNA-ThePitbull-21.4B-v2
16
 
17
  Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3001">b3001</a> for quantization.
@@ -105,3 +228,95 @@ These I-quants can also be used on CPU and Apple Metal, but will be slower than
105
  The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
106
 
107
  Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - mlabonne/orpo-dpo-mix-40k
11
  quantized_by: bartowski
12
  pipeline_tag: text-generation
13
+ model-index:
14
+ - name: UNA-ThePitbull-21.4B-v2
15
+ results:
16
+ - task:
17
+ type: text-generation
18
+ name: Text Generation
19
+ dataset:
20
+ name: AI2 Reasoning Challenge (25-Shot)
21
+ type: ai2_arc
22
+ config: ARC-Challenge
23
+ split: test
24
+ args:
25
+ num_few_shot: 25
26
+ metrics:
27
+ - type: acc_norm
28
+ value: 77.73
29
+ name: normalized accuracy
30
+ source:
31
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
32
+ name: Open LLM Leaderboard
33
+ - task:
34
+ type: text-generation
35
+ name: Text Generation
36
+ dataset:
37
+ name: HellaSwag (10-Shot)
38
+ type: hellaswag
39
+ split: validation
40
+ args:
41
+ num_few_shot: 10
42
+ metrics:
43
+ - type: acc_norm
44
+ value: 91.79
45
+ name: normalized accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
48
+ name: Open LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: MMLU (5-Shot)
54
+ type: cais/mmlu
55
+ config: all
56
+ split: test
57
+ args:
58
+ num_few_shot: 5
59
+ metrics:
60
+ - type: acc
61
+ value: 68.25
62
+ name: accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: TruthfulQA (0-shot)
71
+ type: truthful_qa
72
+ config: multiple_choice
73
+ split: validation
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: mc2
78
+ value: 78.24
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: Winogrande (5-shot)
87
+ type: winogrande
88
+ config: winogrande_xl
89
+ split: validation
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 87.37
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: GSM8k (5-shot)
104
+ type: gsm8k
105
+ config: main
106
+ split: test
107
+ args:
108
+ num_few_shot: 5
109
+ metrics:
110
+ - type: acc
111
+ value: 63.53
112
+ name: accuracy
113
+ source:
114
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
115
+ name: Open LLM Leaderboard
116
  ---
117
 
118
+ # UNA-ThePitbull 21.4B v2
119
+
120
+ Introducing the best LLM in the industry. Nearly as good as a 70B, just a 21.4B based on saltlux/luxia-21.4b-alignment-v1.0
121
+ ![UNA - ThePitbull 21.4B v2](https://huggingface.co/fblgit/UNA-ThePitbull-21.4B-v2/resolve/main/DE-UNA-ThePitbull-21.4B-v2.png)
122
+
123
+ This model has not been poisoned to score high and be useless. We release him becaues its the real deal of EQ & IQ all together in a crazy powerful smart and conversational model.
124
+
125
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
126
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__UNA-ThePitbull-21.4B-v2)
127
+
128
+ | Metric |Value|
129
+ |---------------------------------|----:|
130
+ |Avg. |77.82|
131
+ |AI2 Reasoning Challenge (25-Shot)|77.73|
132
+ |HellaSwag (10-Shot) |91.79|
133
+ |MMLU (5-Shot) |68.25|
134
+ |TruthfulQA (0-shot) |78.24|
135
+ |Winogrande (5-shot) |87.37|
136
+ |GSM8k (5-shot) |63.53|
137
+
138
  ## Llamacpp imatrix Quantizations of UNA-ThePitbull-21.4B-v2
139
 
140
  Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3001">b3001</a> for quantization.
 
228
  The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
229
 
230
  Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
231
+
232
+ ## Difference V1 vs V2
233
+
234
+ On V2 we implemented a different UNA strategy and covered partially the MLP's and Attention Layers.
235
+ We also performed further SFT over V1 and further DPO over V1 and we'll release some of those soon as well.
236
+
237
+ ### Changes
238
+
239
+ 1. SFT over V1 with `Replete-AI/code_bagel_hermes-2.5` at 1.0e-4 till 5.0e-5
240
+ 2. DPO with: 1.0e-4 to min_lr 5.0e-5
241
+ * `mlabonne/orpo-dpo-mix-40k`
242
+ * `jondurbin/py-dpo-v0.1`
243
+
244
+ # Evaluations
245
+
246
+ Can only be compared with its non-una base model: the original luxia-21.4b and ThePitbull-v1
247
+
248
+ ## UNA v2 (VLLM) Evaluations:
249
+ ```
250
+ vllm (pretrained=/data/tools/mergekit/una-thepitbull-v5,dtype=bfloat16,gpu_memory_utilization=0.8,max_model_len=2048,data_parallel_size=2,tensor_parallel_size=4), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
251
+ | Tasks |Version| Filter |n-shot| Metric |Value | |Stderr|
252
+ |--------------|------:|----------------|-----:|-----------|-----:|---|-----:|
253
+ |gsm8k | 3|strict-match | 5|exact_match|0.7695|± |0.0116|+
254
+ | | |flexible-extract| 5|exact_match|0.7695|± |0.0116|+
255
+ |hellaswag | 1|none | 10|acc |0.8110|± |0.0039|
256
+ | | |none | 10|acc_norm |0.9169|± |0.0028|+
257
+ |winogrande | 1|none | 5|acc |0.8777|± |0.0092|+
258
+ |mmlu |N/A |none | 0|acc |0.6427|± |0.0038|-
259
+ |arc_challenge | 1|none | 25|acc |0.7713|± |0.0123|
260
+ | | |none | 25|acc_norm |0.7875|± |0.0120|+
261
+ |truthfulqa_mc2| 2|none | 0|acc |0.7824|± |0.0135|-
262
+ |mathqa | 1|none | 0|acc |0.4037|± | 0.009|
263
+ | | |none | 0|acc_norm |0.4034|± | 0.009|+
264
+ |pubmedqa | 1|none | 0|acc |0.7260|± | 0.020|+
265
+ |boolq | 2|none | 0|acc |0.8602|± |0.0061|+
266
+ ```
267
+
268
+ ## UNA v1 (VLLM) Evaluations
269
+ ```
270
+ | Tasks |Version| Filter |n-shot| Metric |Value | |Stderr|
271
+ |--------------|------:|----------------|-----:|-----------|-----:|---|-----:|
272
+ |gsm8k | 3|strict-match | 5|exact_match|0.7566|± |0.0118|
273
+ | | |flexible-extract| 5|exact_match|0.7582|± |0.0118|
274
+ |hellaswag | 1|none | 10|acc |0.8168|± |0.0039|
275
+ | | |none | 10|acc_norm |0.9188|± |0.0027|
276
+ |winogrande | 1|none | 5|acc |0.8635|± |0.0097|
277
+ |mmlu | N/A|none | 0|acc |0.6444|± |0.0038|
278
+ |arc_challenge | 1|none | 25|acc |0.7747|± |0.0122|
279
+ | | |none | 25|acc_norm |0.7850|± |0.0120|
280
+ |truthfulqa_mc2| 2|none | 0|acc |0.7902|± |0.0134|
281
+ |mathqa | 1|none | 0|acc |0.4030|± | 0.009|
282
+ | | |none | 0|acc_norm |0.4034|± | 0.009|
283
+ |pubmedqa | 1|none | 0|acc |0.6860|± |0.0208|
284
+ |boolq | 2|none | 0|acc |0.8401|± |0.0064|
285
+ ```
286
+
287
+ ## Original (VLLM) Evaluations
288
+ ```
289
+ | Tasks |Version| Filter |n-shot| Metric |Value | |Stderr|
290
+ |--------------|------:|----------------|-----:|-----------|-----:|---|-----:|
291
+ |gsm8k | 3|strict-match | 5|exact_match|0.7528|± |0.0119|
292
+ | | |flexible-extract| 5|exact_match|0.7521|± |0.0119|
293
+ |hellaswag | 1|none | 10|acc |0.8117|± |0.0039|
294
+ | | |none | 10|acc_norm |0.9167|± |0.0028|
295
+ |winogrande | 1|none | 5|acc |0.8682|± |0.0095|
296
+ |mmlu | N/A|none | 0|acc |0.6448|± |0.0038|
297
+ |arc_challenge | 1|none | 25|acc |0.7688|± |0.0123|
298
+ | | |none | 25|acc_norm |0.7730|± |0.0122|
299
+ |truthfulqa_mc2| 2|none | 0|acc |0.7895|± |0.0133|
300
+ |mathqa | 1|none | 0|acc |0.4000|± | 0.009|
301
+ | | |none | 0|acc_norm |0.4003|± | 0.009|
302
+ |pubmedqa | 1|none | 0|acc |0.6680|± |0.0211|
303
+ |boolq | 2|none | 0|acc |0.8346|± |0.0065|
304
+ ```
305
+
306
+ ## Citations
307
+ * mlabonne
308
+ * jondurbin & Replete-AI
309
+ * bartowski
310
+ * saltlux
311
+
312
+ If you use UNA models dont forget to cite:
313
+ ```
314
+ @misc{unathepitbull21b,
315
+ title={ThePitbull: Uniform Neural Alignment},
316
+ author={Xavier Murias},
317
+ year={2024},
318
+ publisher = {Juanako.AI},
319
+ journal = {HuggingFace repository},
320
+ howpublished = {\url{https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1}},
321
+ }
322
+ ```