Tijmen2 commited on
Commit
41b412d
1 Parent(s): 91e8a08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -106,6 +106,24 @@ When using one of the quantized versions, make sure to pass the quantization con
106
  }
107
  ```
108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ## Example output
110
 
111
  **User:**
 
106
  }
107
  ```
108
 
109
+ ## Standard evaluations
110
+
111
+ cosmosage can be compared to OpenHermes-2.5-Mistral-7B using standard evaluation metrics.
112
+
113
+ | Test Category | cosmosage_v2 | OpenHermes-2.5-Mistral-7B |
114
+ |---------------|-------------------------|------------------------------------|
115
+ | Overall | 0.595 | 0.632 |
116
+ | ARC Challenge | 0.565 | 0.613 |
117
+ | Hellaswag | 0.619 | 0.652 |
118
+ | TruthfulQA:mc1 | 0.348 | 0.361 |
119
+ | TruthfulQA:mc2 | 0.510 | 0.522 |
120
+ | Winogrande | 0.759 | 0.781 |
121
+ | GSM8k | 0.368 | 0.261 |
122
+
123
+ cosmosage performs only slightly below OpenHermes-2.5-Mistral-7B on most metrics, indicating that the heavy
124
+ specialization in cosmology has left its general-purpose abilities nearly unchanged. The exception is GSM8k,
125
+ which is a collection of grade school math problems. Here, cosmosage performs slightly better.
126
+
127
  ## Example output
128
 
129
  **User:**