ibivibiv commited on
Commit
6e762a5
1 Parent(s): ecb80c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -2
README.md CHANGED
@@ -6,11 +6,10 @@ tags:
6
  - logic
7
  - planning
8
  ---
 
9
 
10
  ![img](./strix_rufipes.png)
11
 
12
- # Strix Rufipes 70B
13
-
14
  # Model Details
15
  * **Trained by**: [ibivibiv](https://huggingface.co/ibivibiv)
16
  * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
@@ -18,6 +17,76 @@ tags:
18
  * **Language(s)**: English
19
  * **Purpose**: Has specific training for logic enforcement, will do well in ARC or other logic testing as well as critical thinking tasks. This model is targeted towards planning exercises.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  # Prompting
22
 
23
  ## Prompt Template for alpaca style
 
6
  - logic
7
  - planning
8
  ---
9
+ # Strix Rufipes 70B
10
 
11
  ![img](./strix_rufipes.png)
12
 
 
 
13
  # Model Details
14
  * **Trained by**: [ibivibiv](https://huggingface.co/ibivibiv)
15
  * **Library**: [HuggingFace Transformers](https://github.com/huggingface/transformers)
 
17
  * **Language(s)**: English
18
  * **Purpose**: Has specific training for logic enforcement, will do well in ARC or other logic testing as well as critical thinking tasks. This model is targeted towards planning exercises.
19
 
20
+ # Benchmark Scores
21
+
22
+ | Test Name | Accuracy |
23
+ |-------------------------------------------------------|----------------------|
24
+ | average of all | 0.6910894247381432 |
25
+ | arc:challenge | 0.674061433447099 |
26
+ | hellaswag | 0.6898028281218881 |
27
+ | hendrycksTest-abstract_algebra | 0.36 |
28
+ | hendrycksTest-anatomy | 0.6370370370370371 |
29
+ | hendrycksTest-astronomy | 0.7960526315789473 |
30
+ | hendrycksTest-business_ethics | 0.73 |
31
+ | hendrycksTest-clinical_knowledge | 0.7169811320754716 |
32
+ | hendrycksTest-college_biology | 0.8125 |
33
+ | hendrycksTest-college_chemistry | 0.47 |
34
+ | hendrycksTest-college_computer_science | 0.56 |
35
+ | hendrycksTest-college_mathematics | 0.36 |
36
+ | hendrycksTest-college_medicine | 0.6820809248554913 |
37
+ | hendrycksTest-college_physics | 0.43137254901960786 |
38
+ | hendrycksTest-computer_security | 0.75 |
39
+ | hendrycksTest-conceptual_physics | 0.6851063829787234 |
40
+ | hendrycksTest-econometrics | 0.4824561403508772 |
41
+ | hendrycksTest-electrical_engineering | 0.5793103448275863 |
42
+ | hendrycksTest-elementary_mathematics | 0.41534391534391535 |
43
+ | hendrycksTest-formal_logic | 0.48412698412698413 |
44
+ | hendrycksTest-global_facts | 0.5 |
45
+ | hendrycksTest-high_school_biology | 0.8064516129032258 |
46
+ | hendrycksTest-high_school_chemistry | 0.5073891625615764 |
47
+ | hendrycksTest-high_school_computer_science | 0.71 |
48
+ | hendrycksTest-high_school_european_history | 0.8424242424242424 |
49
+ | hendrycksTest-high_school_geography | 0.8787878787878788 |
50
+ | hendrycksTest-high_school_government_and_politics | 0.9326424870466321 |
51
+ | hendrycksTest-high_school_macroeconomics | 0.717948717948718 |
52
+ | hendrycksTest-high_school_mathematics | 0.2962962962962963 |
53
+ | hendrycksTest-high_school_microeconomics | 0.7521008403361344 |
54
+ | hendrycksTest-high_school_physics | 0.48344370860927155 |
55
+ | hendrycksTest-high_school_psychology | 0.8788990825688073 |
56
+ | hendrycksTest-high_school_statistics | 0.5277777777777778 |
57
+ | hendrycksTest-high_school_us_history | 0.9019607843137255 |
58
+ | hendrycksTest-high_school_world_history | 0.8776371308016878 |
59
+ | hendrycksTest-human_aging | 0.7802690582959642 |
60
+ | hendrycksTest-human_sexuality | 0.8244274809160306 |
61
+ | hendrycksTest-international_law | 0.8677685950413223 |
62
+ | hendrycksTest-jurisprudence | 0.8148148148148148 |
63
+ | hendrycksTest-logical_fallacies | 0.7914110429447853 |
64
+ | hendrycksTest-machine_learning | 0.5357142857142857 |
65
+ | hendrycksTest-management | 0.8543689320388349 |
66
+ | hendrycksTest-marketing | 0.8974358974358975 |
67
+ | hendrycksTest-medical_genetics | 0.73 |
68
+ | hendrycksTest-miscellaneous | 0.8569604086845466 |
69
+ | hendrycksTest-moral_disputes | 0.7687861271676301 |
70
+ | hendrycksTest-moral_scenarios | 0.5184357541899441 |
71
+ | hendrycksTest-nutrition | 0.7679738562091504 |
72
+ | hendrycksTest-philosophy | 0.7620578778135049 |
73
+ | hendrycksTest-prehistory | 0.8271604938271605 |
74
+ | hendrycksTest-professional_accounting | 0.5390070921985816 |
75
+ | hendrycksTest-professional_law | 0.5743155149934811 |
76
+ | hendrycksTest-professional_medicine | 0.6911764705882353 |
77
+ | hendrycksTest-professional_psychology | 0.7565359477124183 |
78
+ | hendrycksTest-public_relations | 0.7272727272727273 |
79
+ | hendrycksTest-security_studies | 0.8 |
80
+ | hendrycksTest-sociology | 0.8507462686567164 |
81
+ | hendrycksTest-us_foreign_policy | 0.89 |
82
+ | hendrycksTest-virology | 0.5542168674698795 |
83
+ | hendrycksTest-world_religions | 0.8596491228070176 |
84
+ | truthfulqa | 0.4712300987333333 |
85
+ | winogrande | 0.8476716653512234 |
86
+ | gsm8k | 0.5382865807429871 |
87
+
88
+
89
+
90
  # Prompting
91
 
92
  ## Prompt Template for alpaca style