InferenceIllusionist
commited on
Commit
•
cf48ee0
1
Parent(s):
e468ef4
Added t/s benchmark, updated sizes, fixed tables
Browse files
README.md
CHANGED
@@ -39,9 +39,15 @@ Please note importance matrix quantizations are a work in progress, IQ3 and abov
|
|
39 |
|
40 |
| Quant | Size (GB) | Comments |
|
41 |
|:-----|--------:|:------|
|
42 |
-
| [
|
43 |
-
| [
|
44 |
-
| [
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
|
47 |
Original model card can be found [here](https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b) and below.
|
@@ -63,8 +69,8 @@ Cerebrum 8x7b offers competitive performance to Gemini 1.0 Pro and GPT-3.5 Turbo
|
|
63 |
|
64 |
## Benchmarking
|
65 |
An overview of Cerebrum 8x7b performance compared to Gemini 1.0 Pro, GPT-3.5 and Mixtral 8x7b on selected benchmarks:
|
66 |
-
<img src="benchmarking.png" alt="benchmarking_chart" width="750"/>
|
67 |
-
<img src="benchmarking_table.png" alt="benchmarking_table" width="750"/>
|
68 |
|
69 |
Evaluation details:
|
70 |
1) ARC-C: all models evaluated zero-shot. Gemini 1.0 Pro and GPT-3.5 (gpt-3.5-turbo-0125) evaluated via API, reported numbers taken for Mixtral 8x7b.
|
|
|
39 |
|
40 |
| Quant | Size (GB) | Comments |
|
41 |
|:-----|--------:|:------|
|
42 |
+
| [IQ2_S](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-IQ2_S.gguf?download=true) | 14.1 | Can be fully offloaded on 16GB VRAM for up to 38.94 t/s [(source)](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/discussions/1#660897c68695a785ed7363b3) |
|
43 |
+
| [IQ2_M](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-IQ2_M.gguf?download=true) | 15.5 | |
|
44 |
+
| [IQ3_XXS](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-IQ3_XXS.gguf?download=true) | 18.2 | |
|
45 |
+
| [IQ3_XS](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-IQ3_XS.gguf?download=true) | 19.3 | |
|
46 |
+
| [IQ3_S](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-IQ3_S.gguf?download=true) | 20.4 | |
|
47 |
+
| [IQ3_M](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/resolve/main/Cerebrum-1.0-8x7b-iMat-IQ3_M.gguf?download=true) | 21.4 | |
|
48 |
+
| [IQ4_XS](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-IQ4_XS.gguf?download=true) | 25.1 | Better quality than Q3_K_L and below |
|
49 |
+
| [Q4_K_M ](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-Q4_K_M.gguf?download=true) | 28.4 | |
|
50 |
+
| [Q5_K_M ](https://huggingface.co/Quant-Cartel/Cerebrum-1.0-8x7b-iMat-GGUF/blob/main/Cerebrum-1.0-8x7b-iMat-Q5_K_M.gguf?download=true) | 33.2 | |
|
51 |
|
52 |
|
53 |
Original model card can be found [here](https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b) and below.
|
|
|
69 |
|
70 |
## Benchmarking
|
71 |
An overview of Cerebrum 8x7b performance compared to Gemini 1.0 Pro, GPT-3.5 and Mixtral 8x7b on selected benchmarks:
|
72 |
+
<img src="https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b/resolve/main/benchmarking.png?download=true" alt="benchmarking_chart" width="750"/>
|
73 |
+
<img src="https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b/resolve/main/benchmarking_table.png?download=true" alt="benchmarking_table" width="750"/>
|
74 |
|
75 |
Evaluation details:
|
76 |
1) ARC-C: all models evaluated zero-shot. Gemini 1.0 Pro and GPT-3.5 (gpt-3.5-turbo-0125) evaluated via API, reported numbers taken for Mixtral 8x7b.
|