Qwen3-VL-2B-Sono

It's a huggingface repository of paper "A Multimodal Instruction Dataset and Benchmark for Ultrasound Understanding".

🚀 Quick Start

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
model_path = "Ssdaizi/Qwen3-VL-2B-Sono" 

# Load the model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_path,
    dtype=torch.bfloat16,
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(model_path)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "test1.png",
            },
            {
                "type": "text",
                "text": "Is this a benign lesion or a malignant lesion?",
            }
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)
inputs = inputs.to(model.device)

# Generate response
generated_ids = model.generate(
    **inputs,
    max_new_tokens=512,
    top_p=0.8,
    top_k=20,
    temperature=0.7,
    repetition_penalty=1.0
)

# Remove input tokens from generated output
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)

print(output_text[0])

Downloads last month: 25

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ssdaizi/Qwen3-VL-2B-Sono

Quantizations

1 model