Edit model card

Quantized Octo-planner: On-device Language Model for Planner-Action Agents Framework

This repo includes GGUF quantized models, for our Octo-planner model at NexaAIDev/octopus-planning

GGUF Quantization

To run the models, please download them to your local machine using either git clone or Hugging Face Hub

git clone https://huggingface.co/NexaAIDev/octo-planner-gguf

Run with llama.cpp (Recommended)

  1. Clone and compile:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make
  1. Execute the Model:

Run the following command in the terminal:

./llama-cli -m ./path/to/octopus-planning-Q4_K_M.gguf -p "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"

Run with Ollama

Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:

  1. Install Ollama on your local machine. You can also following the guide from Ollama GitHub repository
git clone https://github.com/ollama/ollama.git ollama
  1. Locate the local Ollama directory:
cd ollama
  1. Create a Modelfile in your directory
touch Modelfile
  1. In the Modelfile, include a FROM statement with the path to your local model, and the default parameters:
FROM ./path/to/octopus-planning-Q4_K_M.gguf
  1. Use the following command to add the model to Ollama:
ollama create octopus-planning-Q4_K_M -f Modelfile
  1. Verify that the model has been successfully imported:
ollama ls
  1. Run the mode
ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"

Quantized GGUF Models Benchmark

Name Quant method Bits Size Use Cases
octopus-planning-Q2_K.gguf Q2_K 2 1.42 GB fast but high loss, not recommended
octopus-planning-Q3_K.gguf Q3_K 3 1.96 GB extremely not recommended
octopus-planning-Q3_K_S.gguf Q3_K_S 3 1.68 GB extremely not recommended
octopus-planning-Q3_K_M.gguf Q3_K_M 3 1.96 GB moderate loss, not very recommended
octopus-planning-Q3_K_L.gguf Q3_K_L 3 2.09 GB not very recommended
octopus-planning-Q4_0.gguf Q4_0 4 2.18 GB moderate speed, recommended
octopus-planning-Q4_1.gguf Q4_1 4 2.41 GB moderate speed, recommended
octopus-planning-Q4_K.gguf Q4_K 4 2.39 GB moderate speed, recommended
octopus-planning-Q4_K_S.gguf Q4_K_S 4 2.19 GB fast and accurate, very recommended
octopus-planning-Q4_K_M.gguf Q4_K_M 4 2.39 GB fast, recommended
octopus-planning-Q5_0.gguf Q5_0 5 2.64 GB fast, recommended
octopus-planning-Q5_1.gguf Q5_1 5 2.87 GB very big, prefer Q4
octopus-planning-Q5_K.gguf Q5_K 5 2.82 GB big, recommended
octopus-planning-Q5_K_S.gguf Q5_K_S 5 2.64 GB big, recommended
octopus-planning-Q5_K_M.gguf Q5_K_M 5 2.82 GB big, recommended
octopus-planning-Q6_K.gguf Q6_K 6 3.14 GB very big, not very recommended
octopus-planning-Q8_0.gguf Q8_0 8 4.06 GB very big, not very recommended
octopus-planning-F16.gguf F16 16 7.64 GB extremely big

Quantized with llama.cpp

Downloads last month
0
GGUF
Model size
3.82B params
Architecture
phi3
Inference API
Input a message to start chatting with NexaAIDev/octo-planner-gguf.
This model can be loaded on Inference API (serverless).