Crystalcareai commited on
Commit
ddced7f
1 Parent(s): 1e41e74

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md CHANGED
@@ -75,6 +75,11 @@ Despite its compact size, Arcee Spark offers deep reasoning capabilities, making
75
  <div style="display: flex; justify-content: center; margin: 20px 0;">
76
  <img src="https://i.ibb.co/BLX8GmZ/Screenshot-2024-06-23-at-10-43-50-PM.png" alt="Additional Benchmark Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
77
  </div>
 
 
 
 
 
78
  ### MT-Bench
79
 
80
  ```markdown
@@ -144,6 +149,32 @@ AGI-eval average: 51.11
144
 
145
  Gpt4al Average: 69.37
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  ## License
148
 
149
  Arcee Spark is released under the Apache 2.0 license.
 
75
  <div style="display: flex; justify-content: center; margin: 20px 0;">
76
  <img src="https://i.ibb.co/BLX8GmZ/Screenshot-2024-06-23-at-10-43-50-PM.png" alt="Additional Benchmark Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
77
  </div>
78
+
79
+ <div style="display: flex; justify-content: center; margin: 20px 0;">
80
+ <img src="https://i.postimg.cc/Vs7v0Vbn/Screenshot-2024-06-24-at-1-10-58-AM.png" alt="Bigbenchhard Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
81
+ </div>
82
+
83
  ### MT-Bench
84
 
85
  ```markdown
 
149
 
150
  Gpt4al Average: 69.37
151
 
152
+ ## Big Bench Hard
153
+
154
+ | Task |Version| Metric |Value | |Stderr|
155
+ |------------------------------------------------|------:|---------------------|-----:|---|-----:|
156
+ |bigbench_causal_judgement | 0|multiple_choice_grade|0.6053|± |0.0356|
157
+ |bigbench_date_understanding | 0|multiple_choice_grade|0.6450|± |0.0249|
158
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|0.5233|± |0.0312|
159
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|± |0.0212|
160
+ | | |exact_str_match |0.0000|± |0.0000|
161
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2840|± |0.0202|
162
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2429|± |0.0162|
163
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4367|± |0.0287|
164
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|0.4720|± |0.0223|
165
+ |bigbench_navigate | 0|multiple_choice_grade|0.4980|± |0.0158|
166
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.5600|± |0.0111|
167
+ |bigbench_ruin_names | 0|multiple_choice_grade|0.4375|± |0.0235|
168
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2685|± |0.0140|
169
+ |bigbench_snarks | 0|multiple_choice_grade|0.7348|± |0.0329|
170
+ |bigbench_sports_understanding | 0|multiple_choice_grade|0.6978|± |0.0146|
171
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|0.4060|± |0.0155|
172
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2072|± |0.0115|
173
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1406|± |0.0083|
174
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4367|± |0.0287|
175
+
176
+ Big Bench average: 45.78
177
+
178
  ## License
179
 
180
  Arcee Spark is released under the Apache 2.0 license.