Instructions to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf",
	filename="EdgeRunner-Command-7B.IQ3_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Use Docker

docker model run hf.co/RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

LM Studio
Jan
Ollama
How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with Ollama:
```
ollama run hf.co/RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M
```

Unsloth Studio

How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf to start chatting

How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with Docker Model Runner:
```
docker model run hf.co/RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M
```

Lemonade

How to use RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RichardErkhov/edgerunner-ai_-_EdgeRunner-Command-7B-gguf:Q4_K_M

Run and chat with the model

lemonade run user.edgerunner-ai_-_EdgeRunner-Command-7B-gguf-Q4_K_M

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quantization made by Richard Erkhov.

Github

Discord

Request more models

EdgeRunner-Command-7B - GGUF

Model creator: https://huggingface.co/edgerunner-ai/
Original model: https://huggingface.co/edgerunner-ai/EdgeRunner-Command-7B/

Name	Quant method	Size
EdgeRunner-Command-7B.Q2_K.gguf	Q2_K	2.81GB
EdgeRunner-Command-7B.IQ3_XS.gguf	IQ3_XS	3.12GB
EdgeRunner-Command-7B.IQ3_S.gguf	IQ3_S	3.26GB
EdgeRunner-Command-7B.Q3_K_S.gguf	Q3_K_S	3.25GB
EdgeRunner-Command-7B.IQ3_M.gguf	IQ3_M	3.33GB
EdgeRunner-Command-7B.Q3_K.gguf	Q3_K	3.55GB
EdgeRunner-Command-7B.Q3_K_M.gguf	Q3_K_M	3.55GB
EdgeRunner-Command-7B.Q3_K_L.gguf	Q3_K_L	3.81GB
EdgeRunner-Command-7B.IQ4_XS.gguf	IQ4_XS	3.96GB
EdgeRunner-Command-7B.Q4_0.gguf	Q4_0	4.13GB
EdgeRunner-Command-7B.IQ4_NL.gguf	IQ4_NL	4.16GB
EdgeRunner-Command-7B.Q4_K_S.gguf	Q4_K_S	4.15GB
EdgeRunner-Command-7B.Q4_K.gguf	Q4_K	4.36GB
EdgeRunner-Command-7B.Q4_K_M.gguf	Q4_K_M	4.36GB
EdgeRunner-Command-7B.Q4_1.gguf	Q4_1	4.54GB
EdgeRunner-Command-7B.Q5_0.gguf	Q5_0	4.95GB
EdgeRunner-Command-7B.Q5_K_S.gguf	Q5_K_S	4.95GB
EdgeRunner-Command-7B.Q5_K.gguf	Q5_K	5.07GB
EdgeRunner-Command-7B.Q5_K_M.gguf	Q5_K_M	5.07GB
EdgeRunner-Command-7B.Q5_1.gguf	Q5_1	5.36GB
EdgeRunner-Command-7B.Q6_K.gguf	Q6_K	5.82GB
EdgeRunner-Command-7B.Q8_0.gguf	Q8_0	7.54GB

Original model description:

library_name: transformers license: apache-2.0 language: - en base_model: edgerunner-ai/EdgeRunner-Tactical-7B

EdgeRunner-Command-7B

We’re excited to announce the release of EdgeRunner Command, a cutting-edge 7B parameter language model designed specifically for function calling and mission tasks. Initialized from our EdgeRunner-Tactical-7B , EdgeRunner Command offers performance comparable to much larger models while maintaining efficiency and speed at the edge.

The model is formatted to support ChatML and specializes in function calling capabilities when interacting with transformers.

Prompt Format for Function Calling

Our model was trained on specific system prompts and structures for Function Calling.

You should use the system role with this message, followed by a function signature json as this example shows here.

<|im_start|>system
You are a helpful assistant with access to the following functions. Use them if required:
[AVAILABLE_TOOLS] [{"name": "search", "description": "Searches the web for the given text and returns the top 5 results.", "parameters": {"type": "object", "properties": {"text": {"type": "string", "description": "The text to search for."}}, "required": ["text"]}}][/AVAILABLE_TOOLS]<|im_end|>

To complete the function call, create a user prompt that follows the above system prompt, like so:

<|im_start|>user
How to train a dragon?<|im_end|>

The model will then generate a tool call, which your inference code must parse, and plug into a function :

|im_start|>assistant
[TOOL_CALLS] [{ "name": "search", "arguments": {"text": "how to train a dragon"}}]<|im_end|>

Once you parse the tool call, call the function and get the returned values for the call, and pass it back in as a new role, tool like so:

<|im_start|>tool
[TOOL_RESULTS] [{"name": "search", "content": "..."}][/TOOL_RESULTS]<|im_end|>

The assistant will then read in that data from the function's response, and generate a natural language response:

<|im_start|>assistant
According to my search, training a dragon is not something ....<|im_end|>

Usage

To use this example, you'll need transformers version 4.42.0 or higher. Please see the function calling guide in the transformers docs for more information.

Example Code

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "edgerunner-ai/EdgeRunner-Command-7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)

def get_current_weather(location: str, format: str):
    """
    Get the current weather.

    Args:
        location: The city and state, e.g. San Francisco, CA
        format: The temperature unit to use. Infer this from the user's location. (choices: ["celsius", "fahrenheit"])
    """
    pass

conversation = [{"role": "user", "content": "What's the weather like in Paris?"}]
tools = [get_current_weather]

# Render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_chat_template(
            conversation,
            tools=tools,
            tokenize=False,
            add_generation_prompt=True,
)

inputs = tokenizer(tool_use_prompt, return_tensors="pt")

model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note that, this example does not show a complete cycle of calling a tool and adding the tool call and tool results to the chat history so that the model can use them in its next generation. For a full tool calling example, please see the function calling guide

Benchmarks

Berkeley Function Calling Benchmark Results

Test Name	Accuracy
multiple_function	0.94
parallel_multiple_function	0.83
parallel_function	0.77
simple	0.91

Other Benchmark:

Benchmark	Score
Arena Hard	31.99
MMLU-Redux	67.82
GSM	80.89
MT-Bench	8.32

Downloads last month: 28

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support