Image-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
text-generation-inference
unsloth
conversational
Instructions to use sophy/finetuned-qwen-referrals with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sophy/finetuned-qwen-referrals with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="sophy/finetuned-qwen-referrals") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("sophy/finetuned-qwen-referrals") model = AutoModelForMultimodalLM.from_pretrained("sophy/finetuned-qwen-referrals") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sophy/finetuned-qwen-referrals with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sophy/finetuned-qwen-referrals" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sophy/finetuned-qwen-referrals", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/sophy/finetuned-qwen-referrals
- SGLang
How to use sophy/finetuned-qwen-referrals with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sophy/finetuned-qwen-referrals" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sophy/finetuned-qwen-referrals", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sophy/finetuned-qwen-referrals" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sophy/finetuned-qwen-referrals", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use sophy/finetuned-qwen-referrals with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sophy/finetuned-qwen-referrals to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sophy/finetuned-qwen-referrals to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sophy/finetuned-qwen-referrals to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="sophy/finetuned-qwen-referrals", max_seq_length=2048, ) - Docker Model Runner
How to use sophy/finetuned-qwen-referrals with Docker Model Runner:
docker model run hf.co/sophy/finetuned-qwen-referrals
Uploaded finetuned model
- Developed by: sophy
- License: apache-2.0
- Finetuned from model : unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit
This qwen3_vl model was trained 2x faster with Unsloth and Huggingface's TRL library.
How to use
This model is a vision-language model fine-tuned from Qwen3-VL using Unsloth to extract structured JSON from referral forms.
Install
pip install "unsloth>=2024.10.0" transformers accelerate bitsandbytes pillow huggingface_hub
### Usage (Unsloth – recommended)
The simplest way to use this model is via Unsloth’s FastVisionModel:
```bash
from unsloth import FastVisionModel
from PIL import Image
import torch
MODEL_REPO = "sophy/finetuned-qwen-referrals"
# Load model in 4-bit for lower memory use
model, tokenizer = FastVisionModel.from_pretrained(
MODEL_REPO,
load_in_4bit = True,
trust_remote_code = True,
)
# Puts the model in inference mode (disables grad, etc.)
FastVisionModel.for_inference(model)
image_path = "test.png" # referral form image
image = Image.open(image_path).convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{
"type": "text",
"text": (
"Extract all fields and return JSON."
),
},
],
}
]
# Build a chat-style prompt using the model's chat template
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
tokenize = False,
)
# Tokenise image + prompt together
inputs = tokenizer(
image,
prompt,
return_tensors = "pt",
)
# Move to the model's device
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens = 2000,
temperature = 0.1,
top_p = 0.9,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
```
### Usage (pure 🤗 Transformers)
If you prefer to use standard Transformers without Unsloth, you can load the model as a Qwen3-VL vision–language model. A typical pattern is:
``` bash
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
from PIL import Image
import torch
MODEL_REPO = "sophy/finetuned-qwen-referrals"
processor = AutoProcessor.from_pretrained(
MODEL_REPO,
trust_remote_code = True,
)
model = Qwen3VLForConditionalGeneration.from_pretrained(
MODEL_REPO,
torch_dtype = torch.bfloat16, # or "auto"
device_map = "auto",
trust_remote_code = True,
)
image = Image.open("test.png").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{
"type": "text",
"text": (
"Extract all fields and return JSON."
),
},
],
}
]
prompt = processor.tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
tokenize = False,
)
inputs = processor(
text = prompt,
images = image,
return_tensors = "pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens = 2000,
temperature = 0.1,
top_p = 0.9,
)
result = processor.tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
- Downloads last month
- 2
Model tree for sophy/finetuned-qwen-referrals
Base model
Qwen/Qwen3-VL-8B-Instruct