Instructions to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY", filename="Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY # Run inference directly in the terminal: llama cli -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY # Run inference directly in the terminal: llama cli -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY # Run inference directly in the terminal: ./llama-cli -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY # Run inference directly in the terminal: ./build/bin/llama-cli -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Use Docker
docker model run hf.co/jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
- LM Studio
- Jan
- vLLM
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
- Ollama
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with Ollama:
ollama run hf.co/jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
- Unsloth Studio
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY to start chatting
- Pi
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with Docker Model Runner:
docker model run hf.co/jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
- Lemonade
How to use jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jcbtc/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY
Run and chat with the model
lemonade run user.Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY-{{QUANT_TAG}}List all available models
lemonade list
Chadrockv2 Qwen3.6 27B ROCmFP6 STRIX QUALITY
Chadrockv2 Qwen3.6 27B ROCmFP6 STRIX QUALITY is an AMD-tuned GGUF release of the Unsloth Qwen3.6 27B MTP line. It uses a new ROCmFP6 Strix Quality recipe designed to recover Q6-class agent behavior while keeping the ROCmFPX served-speed advantages on AMD Ryzen AI Max+ 395 / Strix Halo systems.
This is a model/runtime pairing, not a generic GGUF quant. The file uses custom ROCmFPX tensor types and will not run correctly with stock upstream llama.cpp. Use the ROCmFPX branch and launch profile documented below.
Full research report:
https://llm.ciru.ai/reports/rocmfp6-quality-research-report-20260624/
Why This Build Exists
The earlier Strix speed ROCmFP6 recipe was too small for agent quality. It measured about 4.82 BPW and scored clearly below the downloaded Unsloth Q6 baseline on HermesAgent-20. This STRIX QUALITY recipe moves closer to a real Q6-class file by keeping the bulk of tensors in Q6_0_ROCMFPX and promoting high-impact tensors to Q8_0_ROCMFPX.
The result is larger than the old speed recipe but materially better on agent quality:
| Model | HermesAgent-20 score | Base pass | Plus pass | HumanEval+ plus | PPL |
|---|---|---|---|---|---|
| Chadrockv2 ROCmFP6 STRIX QUALITY | 0.78 |
14/20 |
11/20 |
155/164 = 94.51% |
6.5543 +/- 0.0941 |
| Unsloth Q6 baseline | 0.76 |
13/20 |
11/20 |
153/164 = 93.29% |
6.5296 +/- 0.0934 |
| Old ROCmFP6 Strix Speed | 0.60 |
10/20 |
9/20 |
152/164 = 92.68% |
6.4077 +/- 0.0902 |
The important lesson from the tuning run is that perplexity alone was not enough. The old small FP6 recipe looked acceptable by PPL, but failed agent scenarios. HermesAgent-20 and EvalPlus showed that the quality recipe recovered the behavior we needed.
Lineage
Qwen/Qwen3.6-27B
-> unsloth/Qwen3.6-27B
-> unsloth/Qwen3.6-27B-MTP-GGUF
-> Chadrockv2 Qwen3.6 27B ROCmFP6 STRIX QUALITY
The public release name and artifact names are Chadrock names. The source lineage remains explicit in metadata, benchmark notes, and credits.
Files
| File | Size | SHA256 |
|---|---|---|
Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY.gguf |
25,196,024,736 bytes |
144062b43fade17c15217acf0b4974041f6135d73945bc13e7c13b1d18946b84 |
Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY.gguf.sha256 |
checksum | same hash as above |
profiles/unsloth-qwen36-27b-mtp-rocmfp6-strix-quality-cap6-q8kv-rocm-hermes64k.env |
launch profile | AMD Strix Halo ROCm profile |
Recipe
| Recipe | Estimated size | BPW | Tensor mix |
|---|---|---|---|
| STRIX QUALITY | 24018.32 MiB |
7.37 |
312 Q6 tensors, 194 Q8 tensors |
| Straight Q6 ROCmFPX | local dry-run | 6.59 |
486 Q6 tensors, 20 Q8 tensors |
| Old Strix Speed | local dry-run | 4.82 |
388 FP4-fast tensors, 118 Q6 tensors |
| Q6 ROCmFPX Agent | local dry-run | 7.40 |
340 Q6 tensors, 166 Q8 tensors |
STRIX QUALITY keeps the default tensor type at Q6_0_ROCMFPX, then promotes:
- token embedding and output tensors
- attention Q, K, V, O, and fused QKV tensors
- selected FFN down/gate tensor bands
- llama.cpp tensors marked by the
use_more_bitsheuristic
The recipe is implemented as:
LLAMA_FTYPE_MOSTLY_Q6_0_ROCMFPX_STRIX_QUALITY = 118
scripts/quantize-rocmfpx-agent.sh --profile strix-quality
Quality Results
HermesAgent-20 is the deciding quality test for this release because it exposes scenario-level failures that aggregate PPL missed.
| Model | Score | Base pass | Plus pass | Generation time |
|---|---|---|---|---|
| Chadrockv2 ROCmFP6 STRIX QUALITY | 0.78 |
14/20 |
11/20 |
1541.503 s |
| Unsloth Q6 baseline | 0.76 |
13/20 |
11/20 |
1037.491 s |
| Old ROCmFP6 Strix Speed | 0.60 |
10/20 |
9/20 |
791.457 s |
EvalPlus confirms that the quality recipe did not trade away coding correctness:
| Model | HumanEval base | HumanEval+ |
|---|---|---|
| Chadrockv2 ROCmFP6 STRIX QUALITY | 161/164 |
155/164 = 94.51% |
| Unsloth Q6 baseline | 160/164 |
153/164 = 93.29% |
| Old ROCmFP6 Strix Speed | 159/164 |
152/164 = 92.68% |
Speed Results
All rows were measured locally on AMD Ryzen AI Max+ 395 / Strix Halo, one-slot served MTP, q8_0 target KV, f16 draft KV, b2048/u512, temperature=0, 512 generated tokens, and no prompt cache reuse.
ROCmFP6 STRIX QUALITY vs Unsloth Q6 Baseline
| Prompt tokens | FP6 ROCm PP tok/s | FP6 ROCm TG tok/s | FP6 total | Q6 ROCm PP tok/s | Q6 ROCm TG tok/s | Q6 total |
|---|---|---|---|---|---|---|
512 |
177.98 |
29.52 |
20.1 s |
200.84 |
22.10 |
25.6 s |
2048 |
188.44 |
20.64 |
34.7 s |
208.53 |
17.38 |
38.4 s |
4096 |
213.53 |
30.73 |
33.5 s |
227.13 |
27.75 |
34.3 s |
16384 |
223.76 |
30.03 |
85.9 s |
218.75 |
25.76 |
90.3 s |
65536 |
171.08 |
15.72 |
388.4 s |
166.15 |
10.81 |
413.7 s |
ROCm vs Vulkan for This FP6 File
| Prompt tokens | ROCm TG tok/s | ROCm total | Vulkan TG tok/s | Vulkan total |
|---|---|---|---|---|
512 |
29.52 |
20.1 s |
19.58 |
28.9 s |
2048 |
20.64 |
34.7 s |
19.45 |
36.3 s |
4096 |
30.73 |
33.5 s |
13.10 |
57.6 s |
16384 |
30.03 |
85.9 s |
13.41 |
120.6 s |
65536 |
15.72 |
388.4 s |
9.19 |
471.6 s |
ROCm0 is the recommended backend for this release. Vulkan remains useful as a portability path, but it was slower across this Strix Quality speed matrix.
Run With ROCmFPX
Build the ROCmFPX runner branch containing this ftype and recipe:
git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout rocmfp6-strix-quality
cmake -S . -B build-strix-rocmfp6-quality-hip \
-DGGML_HIP=ON \
-DGGML_VULKAN=ON \
-DCMAKE_BUILD_TYPE=Release
cmake --build build-strix-rocmfp6-quality-hip -j
Launch the validated AMD Strix Halo profile:
HSA_OVERRIDE_GFX_VERSION=11.5.1 \
GGML_HIP_ENABLE_UNIFIED_MEMORY=1 \
./build-strix-rocmfp6-quality-hip/bin/llama-server \
-m /path/to/Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY.gguf \
--alias chadrockv2-qwen36-27b-rocmfp6-strix-quality \
--host 127.0.0.1 \
--port 8080 \
--jinja \
-c 65536 \
-ngl 999 \
-fa on \
-dev ROCm0 \
-sm none \
-b 2048 \
-ub 512 \
-t 16 \
-tb 32 \
-ctk q8_0 \
-ctv q8_0 \
--ctx-checkpoints 0 \
--checkpoint-every-n-tokens -1 \
--spec-type draft-mtp \
--spec-draft-device ROCm0 \
--spec-draft-ngl all \
--spec-draft-type-k f16 \
--spec-draft-type-v f16 \
--spec-draft-n-max 6 \
--spec-draft-n-min 0 \
--spec-draft-p-min 0.0 \
--spec-draft-p-split 0.20 \
--parallel 1 \
--metrics \
--no-mmproj \
--no-context-shift \
--reasoning off \
--reasoning-format none \
--reasoning-budget 0 \
--temp 0 \
--top-p 0.95 \
--top-k 20 \
--repeat-penalty 1.0 \
--seed 123
The profile in this repository is the exact env profile used for the HermesAgent-20 lane:
profiles/unsloth-qwen36-27b-mtp-rocmfp6-strix-quality-cap6-q8kv-rocm-hermes64k.env
Provenance
| Item | Value |
|---|---|
| quant format | Q6_0_ROCMFPX_STRIX_QUALITY |
| ROCmFPX branch | rocmfp6-strix-quality |
| ROCmFPX commit | 7026d4ea51acb6e314526506eccdccdc31987855 |
| public report | https://llm.ciru.ai/reports/rocmfp6-quality-research-report-20260624/ |
| local source filename | Qwen3.6-27B-MTP-BF16-to-Q6_0_ROCMFPX_STRIX_QUALITY.gguf |
| public filename | Chadrockv2-Qwen3.6-27B-ROCmFP6-STRIX-QUALITY.gguf |
The local source filename is intentionally not used as the public artifact name. The uploaded GGUF uses the clean Chadrockv2 release filename shown above.
Limitations
- This is specifically AMD tuned, with Strix Halo as the measured target.
- The GGUF requires a ROCmFPX-aware llama.cpp runner.
- The recipe prioritizes agent quality and served decode speed, not smallest file size.
- Benchmark numbers are local Strix Halo measurements and depend on driver version, clocks, prompt shape, KV cache settings, and draft-token acceptance.
Credits
- Qwen: Qwen3.6 27B base model family.
- Unsloth: Qwen3.6 27B MTP GGUF source lineage and Q6 baseline used for same-source comparison.
- Charlie / ROCmFPX: ROCmFPX tensor formats and llama.cpp runtime work.
- Ciru Inference Lab: AMD Strix Halo recipe tuning, quality evaluation, speed testing, and report publishing.
- Downloads last month
- 420
We're not able to determine the quantization variants.
