Instructions to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF", filename="Qwopus3.5-4B-Coder-Fable5-v1-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Use Docker
docker model run hf.co/shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
- Ollama
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with Ollama:
ollama run hf.co/shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
- Unsloth Studio
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF to start chatting
- Pi
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with Docker Model Runner:
docker model run hf.co/shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
- Lemonade
How to use shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwopus3.5-4B-Coder-Fable5-v1-GGUF-Q4_K_M
List all available models
lemonade list
💻 Qwopus3.5-4B-Coder-Fable5-v1 GGUF
GGUF builds for llama.cpp, LM Studio, and local inference
Fable-5 traces · agentic coding · tool use · debugging
Overview
Qwopus3.5-4B-Coder-Fable5-v1 is a Fable-5 trace continuation of Jackrong/Qwopus3.5-4B-Coder.
The base model, Qwopus3.5-4B-Coder, is a compact Qwen3.5-based coding model trained for reasoning, tool use, function calling, coding workflows, and agent-style behavior.
This release continues that model on Glint-Research/Fable-5-traces, a dataset of Claude Fable 5 local coding-agent traces. The dataset is heavily oriented around tool-use trajectories, repository work, local command context, code editing, debugging loops, and <think>-style reasoning completions.
The result is a small local coding-agent model intended for:
| Area | Description |
|---|---|
| Tool-use workflows | Bash, Read, Write, Edit, repo inspection, and action traces. |
| Debugging | Failing tests, stack traces, root-cause analysis, and patch planning. |
| Trace-style reasoning | Long-form planning and <think> style reasoning traces. |
| Local agents | Hermes-style, Claude-Code-style, OpenCode-style, and LM Studio workflows. |
Files
Typical GGUF files:
Qwopus3.5-4B-Coder-Fable5-v1-Q4_K_M.ggufQwopus3.5-4B-Coder-Fable5-v1-Q5_K_M.ggufQwopus3.5-4B-Coder-Fable5-v1-mmproj-BF16.gguf
Which file should I use?
| File | Use case |
|---|---|
Q4_K_M |
Best default. Small, fast, good quality. |
Q5_K_M |
Better quality while still compact. |
Q8_0 |
Higher quality, larger memory use, if included. |
mmproj-BF16 |
Multimodal projector for compatible runtimes. |
llama.cpp
llama-cli \
-m Qwopus3.5-4B-Coder-Fable5-v1-Q5_K_M.gguf \
-p "Write a Bash/Read/Edit style plan for debugging a failing Python repo." \
-n 768 \
--temp 0.7 \
--top-p 0.95
llama.cpp Server
llama-server \
-m Qwopus3.5-4B-Coder-Fable5-v1-Q5_K_M.gguf \
--host 0.0.0.0 \
--port 8080 \
--ctx-size 8192
Then call it with an OpenAI-compatible client:
curl -X POST "http://localhost:8080/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Qwopus3.5-4B-Coder-Fable5-v1-Q5_K_M.gguf",
"messages": [
{"role": "user", "content": "Write a tool-use plan for debugging a Python repo."}
],
"temperature": 0.7,
"top_p": 0.95
}'
About the Fable-5 Traces
Glint-Research/Fable-5-traces contains Claude Fable 5 coding traces.
The dataset includes fields such as:
uid
source_file
session
model
context
cot
output_type
output
completion
origin
The examples are not simple chat pairs. They are multi-step agent trajectories with local development context, reasoning traces, and tool-use outputs.
Common patterns in the dataset include:
- user coding requests
- local-command caveats
- repository inspection
- Bash command usage
- file reads
- file writes
- edits
- debugging passes
- playtesting / validation loops
<think>...</think>reasoning traces- tool-use completions
A large portion of the dataset is tool_use style data, which makes it especially relevant for local coding agents and developer automation.
Capabilities
Agentic coding
Designed for coding-agent loops where the model must inspect a repo, plan work, call tools, edit files, and validate changes.
Tool-use style outputs
Works well with prompts that expose structured tools such as:
Bash
Read
Write
Edit
Search
Grep
Debugging and repair
Useful for:
- finding likely failing files
- explaining stack traces
- planning test commands
- proposing minimal patches
- iterating after errors
Local-first deployment
The release includes Transformers, GGUF, MLX, and MLX 4-bit formats so it can run in Python, llama.cpp, LM Studio, and Apple Silicon workflows.
Available Releases
| Release | Repo | Best for |
|---|---|---|
| Transformers / Safetensors | shuhulx/Qwopus3.5-4B-Coder-Fable5-v1 |
Python, Transformers, custom inference. |
| GGUF | shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-GGUF |
llama.cpp, LM Studio, local CPU/GPU inference. |
| MLX | shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-MLX |
Apple Silicon full MLX inference. |
| MLX 4-bit | shuhulx/Qwopus3.5-4B-Coder-Fable5-v1-MLX-4bit |
Apple Silicon low-memory inference. |
Credits
Built on:
Jackrong/Qwopus3.5-4B-Coderby JackrongGlint-Research/Fable-5-tracesby Glint-Research- Qwen / Qwen3.5 model family
- Unsloth
- Hugging Face
- llama.cpp
- mlx-lm
- Downloads last month
- 157
4-bit
5-bit