runningSnail commited on
Commit
3796393
1 Parent(s): 969d794

add MMLU results

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -95,6 +95,44 @@ print(f'Elapsed time: {end - start:.2f}s')
95
  This model was trained on commercially viable data. For use of our model, refer to the [license information](https://www.nexa4ai.com/licenses).
96
 
97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
  ## References
99
  We thank the Microsoft team for their amazing model!
100
  ```
 
95
  This model was trained on commercially viable data. For use of our model, refer to the [license information](https://www.nexa4ai.com/licenses).
96
 
97
 
98
+ ## Performance
99
+ ### Model Selection
100
+ We leverage the latest Language Large Models for a variety of domains. Below is a summary of the chosen models for each category. In cases where no specialized model exists for a subject, we utilize generic models like Llama3-8b.
101
+
102
+
103
+ | **Model** | **Category** | **Subjects** |
104
+ |-----------------------------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
105
+ | `jondurbin/bagel-8b-v1.0` | Biology | `college_biology`, `high_school_biology` |
106
+ | `Weyaxi/Einstein-v6.1-Llama3-8B` | Physics | `astronomy`, `college_physics`, `conceptual_physics`, `high_school_physics` |
107
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | Business | `business_ethics`, `management`, `marketing` |
108
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | Chemistry | `college_chemistry`, `high_school_chemistry` |
109
+ | `abacusai/Llama-3-Smaug-8B` | Computer Science | `college_computer_science`, `computer_security`, `high_school_computer_science`, `machine_learning` |
110
+ | `Open-Orca/Mistral-7B-OpenOrca` | Math | `abstract_algebra`, `college_mathematics`, `elementary_mathematics`, `high_school_mathematics`, `high_school_statistics` |
111
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | Economics | `econometrics`, `high_school_macroeconomics`, `high_school_microeconomics` |
112
+ | `AdaptLLM/medicine-chat` | Health | `anatomy`, `clinical_knowledge`, `college_medicine`, `human_aging`, `medical_genetics`, `nutrition`, `professional_medicine`, `virology` |
113
+ | `STEM-AI-mtl/phi-2-electrical-engineering` | Engineering | `electrical_engineering` |
114
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | Philosophy | `formal_logic`, `logical_fallacies`, `moral_disputes`, `moral_scenarios`, `philosophy`, `world_religions` |
115
+ | `microsoft/Phi-3-mini-128k-instruct` | Other | `global_facts`, `miscellaneous`, `professional_accounting` |
116
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | History | `high_school_european_history`, `high_school_us_history`, `high_school_world_history`, `prehistory` |
117
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | Culture | `human_sexuality`, `sociology` |
118
+ | `AdaptLLM/law-chat` | Law | `international_law`, `jurisprudence`, `professional_law` |
119
+ | `meta-llama/Meta-Llama-3-8B-Instruct` | Psychology | `high_school_psychology`, `professional_psychology` |
120
+
121
+ ### MMLU Benchmark Results (5-shot learning)
122
+ Here are the comparative MMLU scores for various models tested under a 5-shot learning setup:
123
+
124
+ | **Model** | **MMLU Score** |
125
+ |-----------------------------------|----------------|
126
+ | Octopus-V4 | **74.6%** |
127
+ | GPT-3.5 | 70.0% |
128
+ | Phi-3-mini-128k-instruct | 68.1% |
129
+ | OpenELM-3B | 26.7% |
130
+ | Lamma3-8b-instruct | 68.4% |
131
+ | Gemma-2b | 42.3% |
132
+ | Gemma-7b | 64.3% |
133
+
134
+
135
+
136
  ## References
137
  We thank the Microsoft team for their amazing model!
138
  ```