Instructions to use North-ML1/Wind-Edge-1.6-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use North-ML1/Wind-Edge-1.6-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="North-ML1/Wind-Edge-1.6-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("North-ML1/Wind-Edge-1.6-GGUF", dtype="auto") - llama-cpp-python
How to use North-ML1/Wind-Edge-1.6-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="North-ML1/Wind-Edge-1.6-GGUF", filename="Wind-Edge-1.6-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use North-ML1/Wind-Edge-1.6-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Use Docker
docker model run hf.co/North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use North-ML1/Wind-Edge-1.6-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "North-ML1/Wind-Edge-1.6-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
- SGLang
How to use North-ML1/Wind-Edge-1.6-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "North-ML1/Wind-Edge-1.6-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "North-ML1/Wind-Edge-1.6-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "North-ML1/Wind-Edge-1.6-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use North-ML1/Wind-Edge-1.6-GGUF with Ollama:
ollama run hf.co/North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
- Unsloth Studio new
How to use North-ML1/Wind-Edge-1.6-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for North-ML1/Wind-Edge-1.6-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for North-ML1/Wind-Edge-1.6-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for North-ML1/Wind-Edge-1.6-GGUF to start chatting
- Pi new
How to use North-ML1/Wind-Edge-1.6-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use North-ML1/Wind-Edge-1.6-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use North-ML1/Wind-Edge-1.6-GGUF with Docker Model Runner:
docker model run hf.co/North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
- Lemonade
How to use North-ML1/Wind-Edge-1.6-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull North-ML1/Wind-Edge-1.6-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Wind-Edge-1.6-GGUF-Q4_K_M
List all available models
lemonade list
Wind Edge 1.6 — Geode (0.4B)
A 0.4B parameter causal language model built for edge deployment. Fast, small, and honest about what it can do.
North ML · Wind Arc 1.5 Preview
Overview
Wind Edge 1.6 (Geode) is a compact LLM trained for real-time, on-device inference. At 0.4B parameters it sits in the ultra-small tier — expect strong common-sense and classification performance, limited hard reasoning.
Best use cases:
- Instruction-following dialogue (short to medium turns)
- Text classification and sentiment
- Light code completion
- Summarization of short passages
Not recommended for: multi-step math, complex logical chains, long-context tasks.
Changes vs 1.5
- Improved instruction adherence on structured output formats
- More stable multi-sentence generation (fewer mid-sequence repetitions)
- Reduced hallucination rate on short factual queries (internal held-out eval)
Honest Benchmark Estimates
Realistic ranges for a well-trained 0.4B model — not cherry-picked numbers.
| Task | Expected Range | Notes |
|---|---|---|
| Common Sense (0-shot) | 0.60 – 0.68 | Reliable strength |
| Sentiment Analysis | 0.70 – 0.80 | Reliable strength |
| Text Classification | 0.68 – 0.78 | Reliable strength |
| Reading Comprehension | 0.52 – 0.63 | Context-dependent |
| Summarization | 0.58 – 0.68 | Short docs only |
| Code Generation | 0.45 – 0.58 | Simple tasks only |
| Math Reasoning | 0.15 – 0.28 | Known weak point at this scale |
| Logical Reasoning | 0.18 – 0.28 | Known weak point at this scale |
A 0.4B model cannot compete with 7B+ on reasoning — Geode doesn't pretend to.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("north-ml1/wind-edge-1.6")
tokenizer = AutoTokenizer.from_pretrained("north-ml1/wind-edge-1.6")
inputs = tokenizer("You are Wind Edge, a helpful AI assistant.\nUser: ", return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Recommended Settings
| Parameter | Value |
|---|---|
| temperature | 0.0 |
| top_p | 0.95 |
| min_p | 0.05 |
| max_new_tokens | 256–512 |
| repetition_penalty | 1.1 |
| context_limit | 1024-4096 |
GGUF Quantizations
GGUF quants converted from arthu1/Wind-Edge-1.6-Instruct using a Qwen3-compatible tensor layout. The Transformers repo remains canonical — use these for llama.cpp, LM Studio, Ollama-style runtimes, and any other GGUF-compatible inference stack.
Files
| File | bpw | Use |
|---|---|---|
| Wind-Edge-1.6-TQ1_0.gguf | ~1.7 bpw | Experimental 1-bit/ternary. Lowest quality, smallest size. |
| Wind-Edge-1.6-TQ2_0.gguf | ~2.1 bpw | Very small 2-bit/ternary option. |
| Wind-Edge-1.6-IQ3_M.gguf | ~3.7 bpw | Good balance for tiny devices. |
| Wind-Edge-1.6-Q4_K_M.gguf | ~4.6 bpw | Recommended default. |
| Wind-Edge-1.6-Q6_K.gguf | ~6.1 bpw | Higher quality, still compact. |
| Wind-Edge-1.6-Q8_0.gguf | ~8.5 bpw | Near-lossless practical quant. |
| Wind-Edge-1.6-F16.gguf | 16 bpw | Full precision GGUF export. |
Q4_K_M, Q6_K, and Q8_0 are the recommended daily drivers. TQ1_0 and TQ2_0 are included for constrained edge hardware but will lose measurable reasoning and factual accuracy.
llama.cpp
llama-cli \
-m Wind-Edge-1.6-Q4_K_M.gguf \
-cnv \
--temp 0.6 \
--top-p 0.9 \
--repeat-penalty 1.06 \
-n 512
For deterministic output, use --temp 0 and keep prompts short.
Chat Template
The GGUF metadata includes the chat template. If your runtime doesn't apply it automatically:
<|im_start|>system
You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.<|im_end|>
<|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
<think>
</think>
Model Details
| Property | Value |
|---|---|
| Parameters | ~0.4B |
| Architecture | Causal LM (decoder-only) |
| Context Length | 8192 tokens |
| Quantization | 1-16bit (GGUF) |
| Org | north-ml1 |
License
MIT
- Downloads last month
- 2,704
1-bit
2-bit
3-bit
4-bit
6-bit
8-bit
16-bit
Model tree for North-ML1/Wind-Edge-1.6-GGUF
Base model
North-ML1/Wind-Edge-1.6-BaseDataset used to train North-ML1/Wind-Edge-1.6-GGUF
Collection including North-ML1/Wind-Edge-1.6-GGUF
Evaluation results
- Overall Accuracy on CodeBench-30self-reported6.250
- Easy Tier Accuracy on CodeBench-30self-reported17.140
- Medium Tier Accuracy on CodeBench-30self-reported0.000
- Hard Tier Accuracy on CodeBench-30self-reported0.000