NexaAIDev
/

octo-planner-gguf

@@ -14,7 +14,7 @@ tags:
 This repo includes **GGUF** quantized models, for our Octo-planner model at [NexaAIDev/octopus-planning](https://huggingface.co/NexaAIDev/octopus-planning)
-# GGUF Qauntization
 To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
 ```
@@ -80,4 +80,30 @@ ollama ls
 7. Run the mode
 ```bash
 ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"
-```

 This repo includes **GGUF** quantized models, for our Octo-planner model at [NexaAIDev/octopus-planning](https://huggingface.co/NexaAIDev/octopus-planning)
+# GGUF Quantization
 To run the models, please download them to your local machine using either git clone or [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/en/guides/download)
 ```
 7. Run the mode
 ```bash
 ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"
+```
+# Quantized GGUF Models Benchmark
+| Name                         | Quant method | Bits | Size     | Use Cases                           |
+| ----------------------       | ------------ | ---- | -------- | ----------------------------------- |
+| octopus-planning-Q2_K.gguf   | Q2_K         | 2    | 1.42 GB  | fast but high loss, not recommended |
+| octopus-planning-Q3_K.gguf   | Q3_K         | 3    | 1.96 GB  | extremely not recommended           |
+| octopus-planning-Q3_K_S.gguf | Q3_K_S       | 3    | 1.68 GB  | extremely not recommended           |
+| octopus-planning-Q3_K_M.gguf | Q3_K_M       | 3    | 1.96 GB  | moderate loss, not very recommended |
+| octopus-planning-Q3_K_L.gguf | Q3_K_L       | 3    | 2.09 GB  | not very recommended                |
+| octopus-planning-Q4_0.gguf   | Q4_0         | 4    | 2.18 GB  | moderate speed, recommended         |
+| octopus-planning-Q4_1.gguf   | Q4_1         | 4    | 2.41 GB  | moderate speed, recommended         |
+| octopus-planning-Q4_K.gguf   | Q4_K         | 4    | 2.39 GB  | moderate speed, recommended         |
+| octopus-planning-Q4_K_S.gguf | Q4_K_S       | 4    | 2.19 GB  | fast and accurate, very recommended |
+| octopus-planning-Q4_K_M.gguf | Q4_K_M       | 4    | 2.39 GB  | fast, recommended                   |
+| octopus-planning-Q5_0.gguf   | Q5_0         | 5    | 2.64 GB  | fast, recommended                   |
+| octopus-planning-Q5_1.gguf   | Q5_1         | 5    | 2.87 GB  | very big, prefer Q4                 |
+| octopus-planning-Q5_K.gguf   | Q5_K         | 5    | 2.82 GB  | big, recommended                    |
+| octopus-planning-Q5_K_S.gguf | Q5_K_S       | 5    | 2.64 GB  | big, recommended                    |
+| octopus-planning-Q5_K_M.gguf | Q5_K_M       | 5    | 2.82 GB  | big, recommended                    |
+| octopus-planning-Q6_K.gguf   | Q6_K         | 6    | 3.14 GB  | very big, not very recommended      |
+| octopus-planning-Q8_0.gguf   | Q8_0         | 8    | 4.06 GB  | very big, not very recommended      |
+| octopus-planning-F16.gguf    | F16          | 16   | 7.64 GB  | extremely big                       |
+_Quantized with llama.cpp_