macadeliccc commited on
Commit
d1aab33
1 Parent(s): f6e7dac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -2
README.md CHANGED
@@ -67,10 +67,89 @@ print(generate_response(prompt), "\n")
67
 
68
  ## Eval
69
 
70
- <script src="https://gist.github.com/tdolan21/57404d06a9c102904848b795fdaabef3.js"></script>
71
-
72
  evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Citations
75
 
76
  Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
 
67
 
68
  ## Eval
69
 
 
 
70
  evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
71
 
72
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
73
+ |---------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
74
+ |[laser-dolphin-mixtral-2x7b-dpo](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)| 41.31| 73.67| 61.69| 42.79| 54.87|
75
+
76
+ ### AGIEval
77
+ | Task |Version| Metric |Value| |Stderr|
78
+ |------------------------------|------:|--------|----:|---|-----:|
79
+ |agieval_aqua_rat | 0|acc |22.44|± | 2.62|
80
+ | | |acc_norm|21.26|± | 2.57|
81
+ |agieval_logiqa_en | 0|acc |34.87|± | 1.87|
82
+ | | |acc_norm|35.79|± | 1.88|
83
+ |agieval_lsat_ar | 0|acc |22.17|± | 2.75|
84
+ | | |acc_norm|23.04|± | 2.78|
85
+ |agieval_lsat_lr | 0|acc |43.14|± | 2.20|
86
+ | | |acc_norm|45.10|± | 2.21|
87
+ |agieval_lsat_rc | 0|acc |57.25|± | 3.02|
88
+ | | |acc_norm|55.76|± | 3.03|
89
+ |agieval_sat_en | 0|acc |71.84|± | 3.14|
90
+ | | |acc_norm|71.84|± | 3.14|
91
+ |agieval_sat_en_without_passage| 0|acc |44.17|± | 3.47|
92
+ | | |acc_norm|41.75|± | 3.44|
93
+ |agieval_sat_math | 0|acc |40.91|± | 3.32|
94
+ | | |acc_norm|35.91|± | 3.24|
95
+
96
+ Average: 41.31%
97
+
98
+ ### GPT4All
99
+ | Task |Version| Metric |Value| |Stderr|
100
+ |-------------|------:|--------|----:|---|-----:|
101
+ |arc_challenge| 0|acc |58.02|± | 1.44|
102
+ | | |acc_norm|60.58|± | 1.43|
103
+ |arc_easy | 0|acc |85.48|± | 0.72|
104
+ | | |acc_norm|82.62|± | 0.78|
105
+ |boolq | 1|acc |87.16|± | 0.59|
106
+ |hellaswag | 0|acc |65.04|± | 0.48|
107
+ | | |acc_norm|83.63|± | 0.37|
108
+ |openbookqa | 0|acc |35.60|± | 2.14|
109
+ | | |acc_norm|45.00|± | 2.23|
110
+ |piqa | 0|acc |81.99|± | 0.90|
111
+ | | |acc_norm|83.51|± | 0.87|
112
+ |winogrande | 0|acc |73.16|± | 1.25|
113
+
114
+ Average: 73.67%
115
+
116
+ ### TruthfulQA
117
+ | Task |Version|Metric|Value| |Stderr|
118
+ |-------------|------:|------|----:|---|-----:|
119
+ |truthfulqa_mc| 1|mc1 |44.31|± | 1.74|
120
+ | | |mc2 |61.69|± | 1.50|
121
+
122
+ Average: 61.69%
123
+
124
+ ### Bigbench
125
+ | Task |Version| Metric |Value| |Stderr|
126
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
127
+ |bigbench_causal_judgement | 0|multiple_choice_grade|59.47|± | 3.57|
128
+ |bigbench_date_understanding | 0|multiple_choice_grade|66.67|± | 2.46|
129
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|36.05|± | 3.00|
130
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|20.33|± | 2.13|
131
+ | | |exact_str_match | 7.52|± | 1.39|
132
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|27.80|± | 2.01|
133
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.86|± | 1.51|
134
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|48.67|± | 2.89|
135
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|49.60|± | 2.24|
136
+ |bigbench_navigate | 0|multiple_choice_grade|53.20|± | 1.58|
137
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|68.50|± | 1.04|
138
+ |bigbench_ruin_names | 0|multiple_choice_grade|41.74|± | 2.33|
139
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|16.23|± | 1.17|
140
+ |bigbench_snarks | 0|multiple_choice_grade|64.09|± | 3.58|
141
+ |bigbench_sports_understanding | 0|multiple_choice_grade|70.69|± | 1.45|
142
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|37.70|± | 1.53|
143
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|23.44|± | 1.20|
144
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.60|± | 0.91|
145
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|48.67|± | 2.89|
146
+
147
+ Average: 42.79%
148
+
149
+ Average score: 54.87%
150
+
151
+ Elapsed time: 02:53:28
152
+
153
  ## Citations
154
 
155
  Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.