BerenMillidge
commited on
Commit
•
bf23724
1
Parent(s):
5622826
Update README.md
Browse files
README.md
CHANGED
@@ -55,34 +55,12 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
|
|
55 |
| Model | Size | MT-Bench | IFEval |
|
56 |
|-------------|----|----|----|
|
57 |
| **Zamba2-2.6B-Instruct** | 2.6B | **72.40** | **53.96** |
|
58 |
-
| Mistral-7B-Instruct | 7B |
|
59 |
| Gemma2-2B-Instruct | 2.7B | 51.69 | 48.8 |
|
60 |
| H2O-Danube-4B-Chat | 4B | 52.57 | 45.44 |
|
61 |
| StableLM-Zephyr-3B | 3B | 66.43 | 36.83 |
|
62 |
|
63 |
|
64 |
-
|
65 |
-
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
66 |
-
|--------|-----|----|---------------|--------------|
|
67 |
-
| **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
|
68 |
-
| StableLM Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
|
69 |
-
| Capybara v1.9 | 3B | dSFT | 5.94 | - |
|
70 |
-
| MPT-Chat | 7B |dSFT |5.42| -|
|
71 |
-
| Xwin-LM v0.1 | 7B| dPPO| 6.19| 87.83|
|
72 |
-
| Mistral-Instruct v0.1 | 7B| - | 6.84 |-|
|
73 |
-
| Zephyr-7b-α |7B| dDPO| 6.88| -|
|
74 |
-
| Zephyr-7b-β| 7B | dDPO | 7.34 | 90.60 |
|
75 |
-
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|
76 |
-
| Guanaco | 65B | SFT |6.41| 71.80|
|
77 |
-
| Llama2-Chat | 70B |RLHF |6.86| 92.66|
|
78 |
-
| Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
|
79 |
-
| WizardLM v1.0 | 70B |dSFT |7.71 |-|
|
80 |
-
| Xwin-LM v0.1 | 70B |dPPO |- |95.57|
|
81 |
-
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
82 |
-
| Claude 2 | - |RLHF |8.06| 91.36|
|
83 |
-
| GPT-4 | -| RLHF |8.99| 95.28|
|
84 |
-
|
85 |
-
|
86 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
|
87 |
|
88 |
Time to First Token (TTFT) | Output Generation
|
|
|
55 |
| Model | Size | MT-Bench | IFEval |
|
56 |
|-------------|----|----|----|
|
57 |
| **Zamba2-2.6B-Instruct** | 2.6B | **72.40** | **53.96** |
|
58 |
+
| Mistral-7B-Instruct | 7B | 66.4 | 45.3 |
|
59 |
| Gemma2-2B-Instruct | 2.7B | 51.69 | 48.8 |
|
60 |
| H2O-Danube-4B-Chat | 4B | 52.57 | 45.44 |
|
61 |
| StableLM-Zephyr-3B | 3B | 66.43 | 36.83 |
|
62 |
|
63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
|
65 |
|
66 |
Time to First Token (TTFT) | Output Generation
|