Instructions to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW", filename="Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW # Run inference directly in the terminal: llama cli -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW # Run inference directly in the terminal: llama cli -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW # Run inference directly in the terminal: ./llama-cli -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW # Run inference directly in the terminal: ./build/bin/llama-cli -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Use Docker
docker model run hf.co/jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
- LM Studio
- Jan
- vLLM
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
- Ollama
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with Ollama:
ollama run hf.co/jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
- Unsloth Studio
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW to start chatting
- Pi
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with Docker Model Runner:
docker model run hf.co/jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
- Lemonade
How to use jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jcbtc/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Run and chat with the model
lemonade run user.Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW-{{QUANT_TAG}}List all available models
lemonade list
Qwable 27B Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
Qwable 27B Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW is the new quality-first ROCmFPX GGUF for the Unsloth Qwen3.6 27B MTP line. It replaces the older STRIX QUALITY naming and recipe as the default quality build.
The headline is simple: this is the high-quality Strix Halo ROCmFPX build that keeps the speed path alive without accepting the quality drift seen in earlier small mixed-precision experiments. On the fresh card refresh, it landed at 82 on HermesAgent-20, 154/164 on HumanEval+, 25.92 served MTP decode tok/s on a 20KB prompt, and only 0.002420 mean KLD against the BF16 reference.
This is a model/runtime pairing, not a stock upstream GGUF. The files use ROCmFPX tensor types and should be run with a ROCmFPX-aware llama.cpp runner.
File
| File | Role | BPW | Size | Quality position |
|---|---|---|---|---|
Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW.gguf |
default | 7.6146 |
26,004,616,416 bytes |
best current ROCmFPX quality recipe |
Fresh Comparison
Refresh date: 2026-06-29. Hardware: AMD Ryzen AI Max+ 395 / Strix Halo. Served rows used ROCm, one MTP slot, q8_0/q8_0 target KV, f16/f16 draft KV, draft cap 6, b2048/u512, temperature=0, 512 generated tokens, and a deterministic 20KB prompt measuring 3,946 prompt tokens.
Served MTP Speed
| Model | BPW | Prompt tok/s | Decode tok/s | Total time | Draft accepted | Note |
|---|---|---|---|---|---|---|
| UltraQuality 7.61 BPW | 7.6146 |
209.84 |
25.92 |
38.56 s |
437/439 = 99.5% |
new default quality build |
| Superseded STRIX QUALITY | 7.37 |
177.02 |
8.37 |
83.49 s |
217/1762 = 12.3% |
historical row, not recommended |
UltraQuality is over 3.0x the served decode speed of the superseded old STRIX QUALITY row in this refresh, while also improving the distribution-quality metrics below.
File Quality
PPL was measured with llama-perplexity, WikiText raw, n_ctx=2048, 32 chunks. KLD was measured with llama-perplexity --kl-divergence, BF16 reference, n_ctx=512, 16 chunks.
| Model | PPL | Mean KLD | KLD p99 | KLD p99.9 | Same-top |
|---|---|---|---|---|---|
| UltraQuality 7.61 BPW | 6.5212 +/- 0.09323 |
0.002420 +/- 0.000481 |
0.019161 |
0.150872 |
97.843% +/- 0.227 |
| Superseded STRIX QUALITY | 6.5097 +/- 0.09282 |
0.007113 +/- 0.001182 |
0.057581 |
0.308613 |
96.495% +/- 0.288 |
The PPL row is intentionally not the final quality judge here. UltraQuality is the model that preserves the BF16 distribution closely enough to be the quality default.
Agent And Coding Validation
HermesAgent-20 and EvalPlus are the behavioral checks that catch failures PPL can miss. UltraQuality was rerun for this card refresh. Historical comparison rows are retained only to show what the new default replaces.
| Model | HermesAgent-20 | HumanEval base | HumanEval+ | Harness failures |
|---|---|---|---|---|
| UltraQuality 7.61 BPW | 82 |
160/164 = 97.56% |
154/164 = 93.90% |
0/164 |
| Superseded STRIX QUALITY | 78 |
161/164 = 98.17% |
155/164 = 94.51% |
0/164 |
| Unsloth Q6 comparison | not rerun in refresh | 160/164 = 97.56% |
153/164 = 93.29% |
0/164 |
The important result is the combined shape: UltraQuality keeps Q6-class coding behavior, beats the old STRIX QUALITY row on HermesAgent-20, and reduces KLD drift versus the historical quality recipe.
Recipe Notes
UltraQuality is the user-facing name for the ranked leave-32 ROCmFPX recipe from the current tuning pass. The local build artifact was the attention-rank-leave32/Q6K-splice candidate, promoted here under the clean public name:
Qwable 27B Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW
The older STRIX QUALITY recipe used broad Q6/Q8 promotion and was good enough to show the quality direction, but it had bad served-MTP draft behavior in the refresh. UltraQuality protects the tensors that mattered more surgically, which is why its KLD tail and MTP acceptance recovered at the same time.
Run With ROCmFPX
Build or use a ROCmFPX-aware llama.cpp runner, then launch the default UltraQuality file with the served MTP profile below.
HSA_OVERRIDE_GFX_VERSION=11.5.1 \
GGML_HIP_ENABLE_UNIFIED_MEMORY=1 \
./llama-server \
-m /models/Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW.gguf \
--alias qwable-27b-chadrock-rocmfpx-ultraquality-7p61bpw \
--host 127.0.0.1 \
--port 8080 \
--jinja \
-c 65536 \
-ngl 999 \
-fa on \
-dev ROCm0 \
-sm none \
-b 2048 \
-ub 512 \
-t 16 \
-tb 32 \
-ctk q8_0 \
-ctv q8_0 \
--ctx-checkpoints 0 \
--checkpoint-every-n-tokens -1 \
--spec-type draft-mtp \
--spec-draft-device ROCm0 \
--spec-draft-ngl all \
--spec-draft-type-k f16 \
--spec-draft-type-v f16 \
--spec-draft-n-max 6 \
--spec-draft-n-min 0 \
--spec-draft-p-min 0.0 \
--spec-draft-p-split 0.20 \
--parallel 1 \
--metrics \
--no-mmproj \
--no-context-shift \
--reasoning off \
--reasoning-format none \
--reasoning-budget 0 \
--temp 0 \
--top-p 0.95 \
--top-k 20 \
--repeat-penalty 1.0 \
--seed 123
A matching profile is included at:
profiles/qwable-27b-chadrock-rocmfpx-ultraquality-7p61bpw-rocm-mtp.env
Checksums
| File | SHA256 |
|---|---|
Qwable-27B-Chadrock-ROCmFPX-ULTRAQUALITY-7.61BPW.gguf |
14cb3fb0670163a1b0f73c5df521ce0513cfddd7609d75d0640d00a07537073e |
Evidence
Local refresh artifacts used for this card:
speed: card-refresh-20260629 served MTP refresh
quality: WikiText PPL/KLD file refresh, HermesAgent-20, EvalPlus HumanEval+
The public names intentionally hide the internal recipe filenames. The internal UltraQuality source artifact was the ranked leave-32/Q6K-splice GGUF from the ROCmFPX tuning run.
Limitations
- This is specifically tuned and measured for AMD Strix Halo / Ryzen AI Max+ 395 with ROCm.
- Stock upstream llama.cpp is not enough; use a ROCmFPX-aware runner.
- The headline speed row is a 20KB served-MTP prompt refresh, not a full long-context sweep.
- UMA memory reporting on this platform does not map cleanly to a simple discrete-GPU VRAM number, so this card uses file size and BPW as the public size metrics.
Credits
- Qwen: Qwen3.6 base model family.
- Unsloth: Qwen3.6 27B MTP GGUF source lineage.
- Charlie / ROCmFPX: ROCmFPX tensor formats and llama.cpp runtime work.
- Ciru Inference Lab: ROCmFPX recipe tuning, Strix Halo benchmarking, and model-card validation.
- Downloads last month
- 441
We're not able to determine the quantization variants.
