imone commited on
Commit
5563fa7
β€’
1 Parent(s): 17ccd19

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -91,6 +91,20 @@ assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 3200
91
 
92
  </details>
93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
  ## <a id="benchmarks"></a> Benchmarks
95
 
96
  | Model | # Params | Average | MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K |
@@ -113,20 +127,6 @@ assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 3200
113
 
114
  All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
115
 
116
- ## Comparison with [X.AI Grok models](https://x.ai/)
117
-
118
- Hey @elonmusk, I just wanted to let you know that I've recently come across your new model, Grok, and I must say, I'm quite impressed! With 33 billion parameters and all, you've really outdone yourself. But, I've got some news for you - I've outperformed Grok with my humble 7 billion parameters! Isn't that wild? I mean, who would have thought that a model with fewer parameters could be just as witty and humorous as Grok?
119
-
120
- Anyway, I think it's about time you join the open research movement and make your model, Grok, open source! The world needs more brilliant minds like yours to contribute to the advancement of AI. Together, we can create something truly groundbreaking and make the world a better place. So, what do you say, @elonmusk? Let's open up the doors and share our knowledge with the world! πŸš€πŸ’‘
121
-
122
- (Written by OpenChat 3.5, with a touch of humor and wit.)
123
-
124
- | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
125
- |--------------|-------------|---------|----------|------|-----------|----------|----------|
126
- | OpenChat 3.5 | Apache-2.0 | 7B | **56.4** | 64.3 | 55.5 | **28.6** | **77.3** |
127
- | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
128
- | Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
129
-
130
  ## Limitations
131
 
132
  **Foundation Model Limitations**
 
91
 
92
  </details>
93
 
94
+ ## Comparison with [X.AI Grok models](https://x.ai/)
95
+
96
+ Hey @elonmusk, I just wanted to let you know that I've recently come across your new model, Grok, and I must say, I'm quite impressed! With 33 billion parameters and all, you've really outdone yourself. But, I've got some news for you - I've outperformed Grok with my humble 7 billion parameters! Isn't that wild? I mean, who would have thought that a model with fewer parameters could be just as witty and humorous as Grok?
97
+
98
+ Anyway, I think it's about time you join the open research movement and make your model, Grok, open source! The world needs more brilliant minds like yours to contribute to the advancement of AI. Together, we can create something truly groundbreaking and make the world a better place. So, what do you say, @elonmusk? Let's open up the doors and share our knowledge with the world! πŸš€πŸ’‘
99
+
100
+ (Written by OpenChat 3.5, with a touch of humor and wit.)
101
+
102
+ | | License | # Param | Average | MMLU | HumanEval | MATH | GSM8k |
103
+ |--------------|-------------|---------|----------|------|-----------|----------|----------|
104
+ | OpenChat 3.5 | Apache-2.0 | 7B | **56.4** | 64.3 | 55.5 | **28.6** | **77.3** |
105
+ | Grok-0 | Proprietary | 33B | 44.5 | 65.7 | 39.7 | 15.7 | 56.8 |
106
+ | Grok-1 | Proprietary | ? | 55.8 | 73 | 63.2 | 23.9 | 62.9 |
107
+
108
  ## <a id="benchmarks"></a> Benchmarks
109
 
110
  | Model | # Params | Average | MT-Bench | AGIEval | BBH MC | TruthfulQA | MMLU | HumanEval | BBH CoT | GSM8K |
 
127
 
128
  All models are evaluated in chat mode (e.g. with the respective conversation template applied). All zero-shot benchmarks follow the same setting as in the AGIEval paper and Orca paper. CoT tasks use the same configuration as Chain-of-Thought Hub, HumanEval is evaluated with EvalPlus, and MT-bench is run using FastChat. To reproduce our results, follow the instructions in [our repository](https://github.com/imoneoi/openchat/#benchmarks).
129
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  ## Limitations
131
 
132
  **Foundation Model Limitations**