octopus-v4-gguf / README.md

Zack Zhiyuan Li

wip

ad56bbf 3 months ago

No virus

4.75 kB

	---
	language:
	- en
	license: apache-2.0
	model_name: Octopus-V4-GGUF
	base_model: NexaAIDev/Octopus-v4
	inference: false
	model_creator: NexaAIDev
	quantized_by: Nexa AI, Inc.
	tags:
	- function calling
	- on-device language model
	- gguf
	- llama cpp
	---

	# Octopus V4-GGUF: Graph of language models


	<p align="center">
	- <a href="https://huggingface.co/NexaAIDev/Octopus-v4" target="_blank">Original Model</a>
	- <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Website</a>
	- <a href="https://github.com/NexaAI/octopus-v4" target="_blank">Octopus-v4 Github</a>
	- <a href="https://arxiv.org/abs/2404.19296" target="_blank">ArXiv</a>
	- <a href="https://huggingface.co/spaces/NexaAIDev/domain_llm_leaderboard" target="_blank">Domain LLM Leaderbaord</a>
	</p>

	<p align="center" width="100%">
	<a><img src="octopus-v4-logo.png" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
	</p>

	Acknowledgement:
	We sincerely thank our community members, [ThunderBeee](https://huggingface.co/ThunderBeee) and [ZY6](https://huggingface.co/ZY6), for their extraordinary contributions to this quantization effort. Please explore [Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4) for our original huggingface model.


	## Run with [Ollama](https://github.com/ollama/ollama)

	```bash
	ollama run NexaAIDev/octopus-v4-q4_k_m
	```

	Input example:

	```json
	Query: Tell me the result of derivative of x^3 when x is 2?

	Response: <nexa_4> ('Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.')<nexa_end>

	```
	Note that `<nexa_4>` represents the math gpt.

	### Dataset and Benchmark

	* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
	* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.


	## Quantized GGUF Models

	\| Name \| Quant method \| Bits \| Size \| Respons (token/second) \| Use Cases \|
	\| ---------------------- \| ------------ \| ---- \| ------- \| ---------------------- \| ----------------------------------------- \|
	\| Octopus-v4.gguf \| \| \| 7.20 GB \| 27.64 \| extremely large \|
	\| Octopus-v4-Q2_K.gguf \| Q2_K \| 2 \| 1.32 GB \| 54.20 \| extremely not recommended, high loss \|
	\| Octopus-v4-Q3_K.gguf \| Q3_K \| 3 \| 1.82 GB \| 51.22 \| not recommended \|
	\| Octopus-v4-Q3_K_S.gguf \| Q3_K_S \| 3 \| 1.57 GB \| 51.78 \| not very recommended \|
	\| Octopus-v4-Q3_K_M.gguf \| Q3_K_M \| 3 \| 1.82 GB \| 50.86 \| not very recommended \|
	\| Octopus-v4-Q3_K_L.gguf \| Q3_K_L \| 3 \| 1.94 GB \| 50.05 \| not very recommended \|
	\| Octopus-v4-Q4_0.gguf \| Q4_0 \| 4 \| 2.03 GB \| 65.76 \| good quality, recommended \|
	\| Octopus-v4-Q4_1.gguf \| Q4_1 \| 4 \| 2.24 GB \| 69.01 \| slow, good quality, recommended \|
	\| Octopus-v4-Q4_K.gguf \| Q4_K \| 4 \| 2.23 GB \| 55.76 \| slow, good quality, recommended \|
	\| Octopus-v4-Q4_K_S.gguf \| Q4_K_S \| 4 \| 2.04 GB \| 53.98 \| high quality, recommended \|
	\| Octopus-v4-Q4_K_M.gguf \| Q4_K_M \| 4 \| 1.51 GB \| 58.39 \| some functions loss, not very recommended \|
	\| Octopus-v4-Q5_0.gguf \| Q5_0 \| 5 \| 2.45 GB \| 61.98 \| slow, good quality \|
	\| Octopus-v4-Q5_1.gguf \| Q5_1 \| 5 \| 2.67 GB \| 63.44 \| slow, good quality \|
	\| Octopus-v4-Q5_K.gguf \| Q5_K \| 5 \| 2.58 GB \| 58.28 \| moderate speed, recommended \|
	\| Octopus-v4-Q5_K_S.gguf \| Q5_K_S \| 5 \| 2.45 GB \| 59.95 \| moderate speed, recommended \|
	\| Octopus-v4-Q5_K_M.gguf \| Q5_K_M \| 5 \| 2.62 GB \| 53.31 \| fast, good quality, recommended \|
	\| Octopus-v4-Q6_K.gguf \| Q6_K \| 6 \| 2.91 GB \| 52.15 \| large, not very recommended \|
	\| Octopus-v4-Q8_0.gguf \| Q8_0 \| 8 \| 3.78 GB \| 50.10 \| very large, good quality \|
	\| Octopus-v4-f16.gguf \| f16 \| 16 \| 7.20 GB \| 30.61 \| extremely large \|

	_Quantized with llama.cpp_