philschmid HF staff commited on
Commit
3d12574
1 Parent(s): 5a6c0cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -7
README.md CHANGED
@@ -49,18 +49,99 @@ Zephyr is a series of language models that are trained to act as helpful assista
49
 
50
  ## Performance
51
 
52
- | Model |MT Bench|IFEval|
53
  |-----------------------------------------------------------------------|------:|------:|
54
  |[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 7.81 | 28.76|
55
  |[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.34 | 43.81|
56
- |[gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 6.38 | 38.01|
57
 
58
 
59
- | Model |AGIEval|GPT4All|TruthfulQA|BigBench|Average|
 
60
  |-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
61
- |[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 34.22| 66.37| 52.19| 37.10| 47.47|
62
  |[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 37.52| 71.77| 55.26| 39.77| 51.08|
63
- |[gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 21.33| 40.84| 41.70| 30.25| 33.53|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ## Intended uses & limitations
66
 
@@ -70,8 +151,7 @@ We then further aligned the model with [🤗 TRL's](https://github.com/huggingfa
70
  Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
71
 
72
  ```python
73
- # Install transformers from source - only needed for versions <= v4.38.1
74
- # pip install git+https://github.com/huggingface/transformers.git
75
  # pip install accelerate
76
 
77
  import torch
 
49
 
50
  ## Performance
51
 
52
+ | Model |MT Bench⬇️|IFEval|
53
  |-----------------------------------------------------------------------|------:|------:|
54
  |[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 7.81 | 28.76|
55
  |[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 7.34 | 43.81|
56
+ |[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 6.38 | 38.01|
57
 
58
 
59
+
60
+ | Model |AGIEval|GPT4All|TruthfulQA|BigBench|Average ⬇️|
61
  |-----------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
 
62
  |[zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) | 37.52| 71.77| 55.26| 39.77| 51.08|
63
+ |[zephyr-7b-gemma](https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma)| 34.22| 66.37| 52.19| 37.10| 47.47|
64
+ |[mlabonne/Gemmalpaca-7B](https://huggingface.co/mlabonne/Gemmalpaca-7B)| 21.6 | 40.87| 44.85 | 30.49| 34.45|
65
+ |[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it) | 21.33| 40.84| 41.70| 30.25| 33.53|
66
+
67
+
68
+ <details><summary>Details of AGIEval, GPT4All, TruthfulQA, BigBench </summary>
69
+
70
+ ### AGIEval
71
+ | Task |Version| Metric |Value| |Stderr|
72
+ |------------------------------|------:|--------|----:|---|-----:|
73
+ |agieval_aqua_rat | 0|acc |21.65|± | 2.59|
74
+ | | |acc_norm|25.20|± | 2.73|
75
+ |agieval_logiqa_en | 0|acc |34.72|± | 1.87|
76
+ | | |acc_norm|35.94|± | 1.88|
77
+ |agieval_lsat_ar | 0|acc |19.57|± | 2.62|
78
+ | | |acc_norm|21.74|± | 2.73|
79
+ |agieval_lsat_lr | 0|acc |30.59|± | 2.04|
80
+ | | |acc_norm|32.55|± | 2.08|
81
+ |agieval_lsat_rc | 0|acc |49.07|± | 3.05|
82
+ | | |acc_norm|42.75|± | 3.02|
83
+ |agieval_sat_en | 0|acc |54.85|± | 3.48|
84
+ | | |acc_norm|53.40|± | 3.48|
85
+ |agieval_sat_en_without_passage| 0|acc |37.38|± | 3.38|
86
+ | | |acc_norm|33.98|± | 3.31|
87
+ |agieval_sat_math | 0|acc |30.91|± | 3.12|
88
+ | | |acc_norm|28.18|± | 3.04|
89
+
90
+ Average: 34.22%
91
+
92
+ ### GPT4All
93
+ | Task |Version| Metric |Value| |Stderr|
94
+ |-------------|------:|--------|----:|---|-----:|
95
+ |arc_challenge| 0|acc |49.15|± | 1.46|
96
+ | | |acc_norm|52.47|± | 1.46|
97
+ |arc_easy | 0|acc |77.44|± | 0.86|
98
+ | | |acc_norm|74.75|± | 0.89|
99
+ |boolq | 1|acc |79.69|± | 0.70|
100
+ |hellaswag | 0|acc |60.59|± | 0.49|
101
+ | | |acc_norm|78.00|± | 0.41|
102
+ |openbookqa | 0|acc |29.20|± | 2.04|
103
+ | | |acc_norm|37.80|± | 2.17|
104
+ |piqa | 0|acc |76.82|± | 0.98|
105
+ | | |acc_norm|77.80|± | 0.97|
106
+ |winogrande | 0|acc |64.09|± | 1.35|
107
+
108
+ Average: 66.37%
109
+
110
+ ### TruthfulQA
111
+ | Task |Version|Metric|Value| |Stderr|
112
+ |-------------|------:|------|----:|---|-----:|
113
+ |truthfulqa_mc| 1|mc1 |35.74|± | 1.68|
114
+ | | |mc2 |52.19|± | 1.59|
115
+
116
+ Average: 52.19%
117
+
118
+ ### Bigbench
119
+ | Task |Version| Metric |Value| |Stderr|
120
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
121
+ |bigbench_causal_judgement | 0|multiple_choice_grade|53.68|± | 3.63|
122
+ |bigbench_date_understanding | 0|multiple_choice_grade|59.89|± | 2.55|
123
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|30.23|± | 2.86|
124
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|11.42|± | 1.68|
125
+ | | |exact_str_match | 0.00|± | 0.00|
126
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.40|± | 2.02|
127
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.14|± | 1.49|
128
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|44.67|± | 2.88|
129
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|26.80|± | 1.98|
130
+ |bigbench_navigate | 0|multiple_choice_grade|50.00|± | 1.58|
131
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|52.75|± | 1.12|
132
+ |bigbench_ruin_names | 0|multiple_choice_grade|33.04|± | 2.22|
133
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|33.37|± | 1.49|
134
+ |bigbench_snarks | 0|multiple_choice_grade|48.62|± | 3.73|
135
+ |bigbench_sports_understanding | 0|multiple_choice_grade|58.11|± | 1.57|
136
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|37.20|± | 1.53|
137
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|20.08|± | 1.13|
138
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|15.77|± | 0.87|
139
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|44.67|± | 2.88|
140
+
141
+ Average: 37.1%
142
+
143
+ </details>
144
+
145
 
146
  ## Intended uses & limitations
147
 
 
151
  Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
152
 
153
  ```python
154
+ # pip install transformers>=4.38.2
 
155
  # pip install accelerate
156
 
157
  import torch