nohurry/Opus-4.6-Reasoning-3000x-filtered
Viewer • Updated • 2.33k • 1.79k • 623
How to use salbeal/Qwopus3.5-27B-v3-int4-AutoRound with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="salbeal/Qwopus3.5-27B-v3-int4-AutoRound")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
pipe(text=messages) # Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM
processor = AutoProcessor.from_pretrained("salbeal/Qwopus3.5-27B-v3-int4-AutoRound")
model = AutoModelForMultimodalLM.from_pretrained("salbeal/Qwopus3.5-27B-v3-int4-AutoRound")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use salbeal/Qwopus3.5-27B-v3-int4-AutoRound with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "salbeal/Qwopus3.5-27B-v3-int4-AutoRound"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "salbeal/Qwopus3.5-27B-v3-int4-AutoRound",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker model run hf.co/salbeal/Qwopus3.5-27B-v3-int4-AutoRound
How to use salbeal/Qwopus3.5-27B-v3-int4-AutoRound with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "salbeal/Qwopus3.5-27B-v3-int4-AutoRound" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "salbeal/Qwopus3.5-27B-v3-int4-AutoRound",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "salbeal/Qwopus3.5-27B-v3-int4-AutoRound" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "salbeal/Qwopus3.5-27B-v3-int4-AutoRound",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'How to use salbeal/Qwopus3.5-27B-v3-int4-AutoRound with Docker Model Runner:
docker model run hf.co/salbeal/Qwopus3.5-27B-v3-int4-AutoRound
INT4 quantized version of:
Generated with Intel AutoRound using the distillation source datasets:
auto-round --model_name Jackrong/Qwopus3.5-27B-v3-int4-AutoRound \
--bits 4 --iters 500 --nsamples 512 --enable_torch_compile \
--output_dir Qwopus3.5-27B-v3-int4-AutoRound
vllm serve salbeal/Qwopus3.5-27B-v3-int4-AutoRound \
--trust-remote-code \
--dtype bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "salbeal/Qwopus3.5-27B-v3-int4-AutoRound"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
Please follow the license terms of the original source model.
If you use this quantized model, please cite both the original distilled model by Jackrong and AutoRound:
@misc{jackrong_qwen35_27b_v3
title = {Jackrong/Qwopus3.5-27B-v3},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Jackrong/Qwopus3.5-27B-v3}}
}
@article{cheng2023optimize,
title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},
author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi},
journal={arXiv preprint arXiv:2309.05516},
year={2023}
}