Davidqian123 commited on
Commit
ca2c162
1 Parent(s): 5c45428

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
  This repo includes **GGUF** quantized models, for our Octo-planner model at [NexaAIDev/octopus-planning](https://huggingface.co/NexaAIDev/octopus-planning)
15
 
16
 
17
- # GGUF Qauntization
18
 
19
  To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
20
  ```
@@ -80,4 +80,30 @@ ollama ls
80
  7. Run the mode
81
  ```bash
82
  ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"
83
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  This repo includes **GGUF** quantized models, for our Octo-planner model at [NexaAIDev/octopus-planning](https://huggingface.co/NexaAIDev/octopus-planning)
15
 
16
 
17
+ # GGUF Quantization
18
 
19
  To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
20
  ```
 
80
  7. Run the mode
81
  ```bash
82
  ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"
83
+ ```
84
+
85
+
86
+ # Quantized GGUF Models Benchmark
87
+
88
+ | Name | Quant method | Bits | Size | Use Cases |
89
+ | ---------------------- | ------------ | ---- | -------- | ----------------------------------- |
90
+ | octopus-planning-Q2_K.gguf | Q2_K | 2 | 1.42 GB | fast but high loss, not recommended |
91
+ | octopus-planning-Q3_K.gguf | Q3_K | 3 | 1.96 GB | extremely not recommended |
92
+ | octopus-planning-Q3_K_S.gguf | Q3_K_S | 3 | 1.68 GB | extremely not recommended |
93
+ | octopus-planning-Q3_K_M.gguf | Q3_K_M | 3 | 1.96 GB | moderate loss, not very recommended |
94
+ | octopus-planning-Q3_K_L.gguf | Q3_K_L | 3 | 2.09 GB | not very recommended |
95
+ | octopus-planning-Q4_0.gguf | Q4_0 | 4 | 2.18 GB | moderate speed, recommended |
96
+ | octopus-planning-Q4_1.gguf | Q4_1 | 4 | 2.41 GB | moderate speed, recommended |
97
+ | octopus-planning-Q4_K.gguf | Q4_K | 4 | 2.39 GB | moderate speed, recommended |
98
+ | octopus-planning-Q4_K_S.gguf | Q4_K_S | 4 | 2.19 GB | fast and accurate, very recommended |
99
+ | octopus-planning-Q4_K_M.gguf | Q4_K_M | 4 | 2.39 GB | fast, recommended |
100
+ | octopus-planning-Q5_0.gguf | Q5_0 | 5 | 2.64 GB | fast, recommended |
101
+ | octopus-planning-Q5_1.gguf | Q5_1 | 5 | 2.87 GB | very big, prefer Q4 |
102
+ | octopus-planning-Q5_K.gguf | Q5_K | 5 | 2.82 GB | big, recommended |
103
+ | octopus-planning-Q5_K_S.gguf | Q5_K_S | 5 | 2.64 GB | big, recommended |
104
+ | octopus-planning-Q5_K_M.gguf | Q5_K_M | 5 | 2.82 GB | big, recommended |
105
+ | octopus-planning-Q6_K.gguf | Q6_K | 6 | 3.14 GB | very big, not very recommended |
106
+ | octopus-planning-Q8_0.gguf | Q8_0 | 8 | 4.06 GB | very big, not very recommended |
107
+ | octopus-planning-F16.gguf | F16 | 16 | 7.64 GB | extremely big |
108
+
109
+ _Quantized with llama.cpp_