Instructions to use OpenLLM-Ro/RoQwen3-VL-2B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenLLM-Ro/RoQwen3-VL-2B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenLLM-Ro/RoQwen3-VL-2B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("OpenLLM-Ro/RoQwen3-VL-2B-Instruct")
model = AutoModelForImageTextToText.from_pretrained("OpenLLM-Ro/RoQwen3-VL-2B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OpenLLM-Ro/RoQwen3-VL-2B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenLLM-Ro/RoQwen3-VL-2B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenLLM-Ro/RoQwen3-VL-2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenLLM-Ro/RoQwen3-VL-2B-Instruct

SGLang

How to use OpenLLM-Ro/RoQwen3-VL-2B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenLLM-Ro/RoQwen3-VL-2B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenLLM-Ro/RoQwen3-VL-2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenLLM-Ro/RoQwen3-VL-2B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenLLM-Ro/RoQwen3-VL-2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenLLM-Ro/RoQwen3-VL-2B-Instruct with Docker Model Runner:
```
docker model run hf.co/OpenLLM-Ro/RoQwen3-VL-2B-Instruct
```

Model Card for RoQwen3-VL-2B-Instruct

RoQwen3-VL-2B-Instruct is a Romanian-adapted vision-language model built on top of Qwen/Qwen3-VL-2B-Instruct. It was produced by continued supervised instruction tuning of the base Qwen3-VL checkpoint on a Romanian multimodal SFT mixture covering general instruction following (LLaVA mix), captioning (Pixmo-Cap, Flickr30k-Cap), visual question answering (Pixmo-AA, Pixmo-Cap-QA, Flickr30k-QA), document and chart understanding (CoSyn, FinePDFs), and visual grounding (Pixmo-Points, Pixmo-Count). The model is intended for research on Romanian VLM capabilities.

Model Details

Model Description

Developed by: OpenLLM-Ro
Language(s): Romanian
License: cc-by-nc-4.0
Finetuned from model: Qwen/Qwen3-VL-2B-Instruct
Trained using:

Model Sources

Repository: https://github.com/OpenLLM-Ro/LLaMA-Factory
Paper: https://arxiv.org/abs/2605.31401

Intended Use

Intended Use Cases

RoQwen3-VL-2B-Instruct is intended for research use on Romanian vision-language tasks — captioning, visual question answering, cultural understanding, OCR / document understanding, and visual grounding — and as a starting point for further Romanian VLM adaptation.

Out-of-Scope Use

Use in any manner that violates applicable laws or regulations (including trade-compliance laws), the project's license, or use in languages other than Romanian.

How to Get Started with the Model

import torch
from PIL import Image
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "OpenLLM-Ro/RoQwen3-VL-2B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
).eval()
processor = AutoProcessor.from_pretrained("OpenLLM-Ro/RoQwen3-VL-2B-Instruct")

image = Image.open("example.jpg").convert("RGB")
question = "Descrie imaginea în detaliu."

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": question},
    ]},
]
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device, dtype=torch.bfloat16)

with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Benchmarks

All benchmarks below are evaluated in Romanian. Per-benchmark winners are shown in bold. Micro is the mean over individual benchmarks; Macro is the mean over capability groups.

Aggregate

Model	Micro avg.	Macro avg.
Qwen3-VL-2B-Instruct	51.51	51.31
RoQwen3-VL-2B-Instruct	63.36	62.65

General Understanding

Model	MMBench	MMStar	SeedBench2
Qwen3-VL-2B-Instruct	62.69	45.92	63.38
RoQwen3-VL-2B-Instruct	71.90	50.73	69.29

Knowledge & Reasoning

Model	MMMU	MME
Qwen3-VL-2B-Instruct	38.33	61.59
RoQwen3-VL-2B-Instruct	40.22	62.19

Cultural

Model	CVQA	ALM-Bench	RoMemes	RoCultVLM
Qwen3-VL-2B-Instruct	57.95	48.72	46.68	50.31
RoQwen3-VL-2B-Instruct	61.92	60.97	36.71	54.00

Generation & Open-ended

Model	RoFlickr30k-Caption	RoFlickr30k-QA	LLaVA-Wild	AyaVisionBench	m-WildVision
Qwen3-VL-2B-Instruct	70.09	30.59	29.89	43.04	44.76
RoQwen3-VL-2B-Instruct	83.80	85.70	50.40	55.33	60.08

OCR & Documents

Model	RoCosyn	RoFinepdfs	RoMemes OCR
Qwen3-VL-2B-Instruct	48.63	78.62	91.04
RoQwen3-VL-2B-Instruct	64.07	86.85	89.54

Grounding

Model	PixmoCount	PixmoPoints
Qwen3-VL-2B-Instruct	56.36	10.09
RoQwen3-VL-2B-Instruct	65.28	54.89

Citation

@misc{masala2026intelegi,
      title={``\^{I}n\c{t}elegi Rom\^{a}ne\c{s}te?'' A Recipe for Romanian Vision-Language Models},
      author={Mihai Masala and Marius Leordeanu and Mihai Dascalu and Traian Rebedea},
      year={2026},
      eprint={2605.31401},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.31401},
}

@inproceedings{masala-etal-2024-vorbesti,
    title = "``Vorbeşti Româneşte?'' A Recipe to Train Powerful {R}omanian {LLM}s with {E}nglish Instructions",
    author = "Masala, Mihai and Ilie-Ablachim, Denis and Dima, Alexandru and Corlatescu, Dragos and Zavelca, Miruna and Olaru, Ovio and Terian, Simina and Terian, Andrei and Leordeanu, Marius and Velicu, Horia and Popescu, Marius and Dascalu, Mihai and Rebedea, Traian",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    pages = "11632--11647"
}