Instructions to use oracomputing/Qwen3.5-9B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use oracomputing/Qwen3.5-9B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="oracomputing/Qwen3.5-9B-GGUF", filename="Qwen3.5-9B-OQ-Q3_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use oracomputing/Qwen3.5-9B-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M # Run inference directly in the terminal: llama cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M # Run inference directly in the terminal: llama cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M # Run inference directly in the terminal: ./llama-cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Use Docker
docker model run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
- LM Studio
- Jan
- vLLM
How to use oracomputing/Qwen3.5-9B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "oracomputing/Qwen3.5-9B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "oracomputing/Qwen3.5-9B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
- Ollama
How to use oracomputing/Qwen3.5-9B-GGUF with Ollama:
ollama run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
- Unsloth Studio
How to use oracomputing/Qwen3.5-9B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for oracomputing/Qwen3.5-9B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for oracomputing/Qwen3.5-9B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for oracomputing/Qwen3.5-9B-GGUF to start chatting
- Pi
How to use oracomputing/Qwen3.5-9B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "oracomputing/Qwen3.5-9B-GGUF:Q3_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use oracomputing/Qwen3.5-9B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use oracomputing/Qwen3.5-9B-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "oracomputing/Qwen3.5-9B-GGUF:Q3_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use oracomputing/Qwen3.5-9B-GGUF with Docker Model Runner:
docker model run hf.co/oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
- Lemonade
How to use oracomputing/Qwen3.5-9B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull oracomputing/Qwen3.5-9B-GGUF:Q3_K_M
Run and chat with the model
lemonade run user.Qwen3.5-9B-GGUF-Q3_K_M
List all available models
lemonade list
EVALUATION-ONLY ACCESS
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This is a private evaluation version of Qwen3.5-9B-GGUF (OraQuant).
By agreeing, you accept:
- Internal testing only; no production use
- No commercial use, redistribution, or reverse-engineering
- Deletion of all files after evaluation
- Full terms in
LICENSE
Access is granted only to approved licensees.
Log in or Sign Up to review the conditions and access this model content.
Qwen3.5-9B-GGUF (OraQuant)
This repository contains GGUF builds of Qwen3.5-9B, quantized by Ora Computing with OraQuant (OQ) - Ora Computing's proprietary calibrated quantization.
These are llama.cpp-compatible quantizations of Qwen/Qwen3.5-9B; the underlying weights are unchanged Qwen3.5-9B weights at reduced precision.
Text only.
Qwen/Qwen3.5-9Bis a multimodal model; these GGUFs contain only the language model (text input -> text output). The vision/video input encoders are not included.
Model Overview
Model name: Qwen3.5-9B-GGUF (OraQuant)
Base model: Qwen/Qwen3.5-9B (Apache-2.0, Alibaba Cloud) - these are GGUF quantizations of it
Parameters: ~9 billion (unchanged from the base model)
Quantization: OraQuant (OQ) mixed-precision K-quant GGUFs produced by Ora Computing, provided in two footprints - OQ-Q4_K_M (higher quality) and OQ-Q3_K_M (smaller/faster).
Not fine-tuned, not parameter-reduced: the model architecture and parameter count are identical to the base model; only the weight precision is reduced.
Purpose: Evaluation/test-use only; optimized for local/offline inference and internal benchmarking.
License: See LICENSE (Custom Model License Agreement).
Files in this repo
| File | What it is | Size |
|---|---|---|
Qwen3.5-9B-OQ-Q4_K_M.gguf |
Language model, OraQuant Q4_K_M (higher quality) | ~5.7 GB |
Qwen3.5-9B-OQ-Q3_K_M.gguf |
Language model, OraQuant Q3_K_M (smaller/faster) | ~4.7 GB |
LICENSE |
Custom Model License Agreement | - |
Usage
These GGUFs load with stock upstream llama.cpp (no patch required); use a build with Qwen3.5 support.
export MODEL=/path/to/Qwen3.5-9B-OQ-Q4_K_M.gguf # or the Q3_K_M file
Interactive chat:
./build/bin/llama-cli -m "$MODEL" -ngl 99
Single-shot completion (-st runs one turn then exits):
./build/bin/llama-cli -m "$MODEL" -ngl 99 -st -p "Explain the Chudnovsky algorithm in two sentences."
OpenAI-compatible server (Web UI at http://localhost:8080):
./build/bin/llama-server -m "$MODEL" -ngl 99 \
--served-model-name qwen3.5-9b --host 0.0.0.0 --port 8080
Qwen3.5 is a reasoning model; the chat template and thinking behaviour are carried in the GGUF.
Intended Use & Restrictions
Permitted use
- Internal testing, benchmarking, and evaluation of the model by the named Licensee.
- Exploration of model behaviours, prompt engineering, and non-production prototypes.
Prohibited use
- Deployment in a production or commercial service, publicly-facing API, resale, or redistribution.
- Fine-tuning or creating derivative models for production use without a separate agreement.
- Reverse-engineering the quantization/calibration used to produce these files.
- Disclosure or sharing of the model (or its weights) to third parties beyond the named Licensee.
Out-of-scope use
- Use in regulated or safety-critical contexts (unless separately permitted).
- Any use that violates the Apache License, Version 2.0 under which the upstream model is distributed.
Quantization
- Method: OraQuant (OQ), Ora Computing's proprietary calibrated quantization. The released files are mixed-precision K-quant GGUFs.
- No fine-tuning: the weights are the original
Qwen/Qwen3.5-9Bweights; no additional training was performed. - No parameter-count change: the architecture and ~9B parameter count are unchanged; only weight precision is reduced.
- Footprints:
OQ-Q4_K_Mfor higher quality,OQ-Q3_K_Mfor a smaller/faster footprint.
Limitations & Risks
- Quantized models may not replicate the full behaviour of the base model under all prompt categories, particularly domain-specific or rare inputs.
- The model is provided as-is for testing only and is not certified for production use.
- Users should validate outputs carefully and monitor for bias or unintended behaviours.
Upstream Attribution
This model is derived from the Qwen3.5-9B model released by Alibaba Cloud under the Apache License, Version 2.0.
"Copyright 2025 Alibaba Cloud. Licensed under the Apache License, Version 2.0."
For full terms, see: https://huggingface.co/Qwen/Qwen3.5-9B/blob/main/LICENSE Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0
Contact & Support
For licensing inquiries or to request extended evaluation rights, please contact: info@oracomputing.com
Repository and model access are regulated. Do not redistribute or share without explicit written permission from Ora Computing.
- Downloads last month
- 17
3-bit
4-bit