You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

DynEval Evaluator

This repository contains the DynEval evaluator models, fine-tuned from Qwen3-VL vision-language models for multimodal evaluation workflows.

Two evaluator checkpoints are included:

DynEval Evaluator 2B: Qwen3-VL 2B fine-tuned checkpoint uploaded under DynEval-2B
DynEval Evaluator 4B: Qwen3-VL 4B fine-tuned checkpoint uploaded under DynEval-4B

Both checkpoints are saved in Hugging Face transformers format and can be loaded with Qwen3VLForConditionalGeneration.

Model Variants

Variant	Architecture	Checkpoint	Precision	Training Epochs	Global Step	Last Logged Loss
DynEval Evaluator 2B	`Qwen3VLForConditionalGeneration`	315	`bfloat16`	3.0	315	0.5989
DynEval Evaluator 4B	`Qwen3VLForConditionalGeneration`	471	`bfloat16`	3.0	471	0.5784

Model Details

DynEval Evaluator 2B

Model type: qwen3_vl
Tokenizer: Qwen2Tokenizer
Tokenizer max length: 65,536
Text context config: 262,144 max position embeddings
Text hidden size: 2,048
Text layers: 28
Attention heads: 16
KV heads: 8
Vision encoder depth: 24
Vision hidden size: 1,024
Vision patch size: 16

DynEval Evaluator 4B

Model type: qwen3_vl
Tokenizer: Qwen2Tokenizer
Tokenizer max length: 65,536
Text context config: 262,144 max position embeddings
Text hidden size: 2,560
Text layers: 36
Attention heads: 32
KV heads: 8
Vision encoder depth: 24
Vision hidden size: 1,024
Vision patch size: 16

Special Tokens

The evaluator checkpoints include the following task tokens:

<|T2IA|>
<|IQA|>
<|EVALUATION|>

Use the task token that matches your evaluation setting.

Intended Use

DynEval Evaluator is intended for research use in multimodal evaluation, especially for evaluating text-to-image and image-question answering style outputs.

Example use cases:

text-to-image alignment evaluation
image-question answering evaluation
multimodal response scoring
visual reasoning evaluation
evaluator-based comparison of image generation model outputs

These models should be used with the same prompt format and task tokens used during fine-tuning.

Quick Start

Install recent versions of transformers, torch, and related image dependencies.

pip install torch transformers accelerate pillow

Load the 2B Evaluator

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder="DynEval-2B",
    dtype=torch.bfloat16,
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(
    repo_id,
    subfolder="DynEval-2B",
)

Load the 4B Evaluator

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder="DynEval-4B",
    dtype=torch.bfloat16,
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(
    repo_id,
    subfolder="DynEval-4B",
)

Text-Only Evaluation Example

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"
subfolder = "DynEval-2B"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder=subfolder,
    dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(repo_id, subfolder=subfolder)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "<|EVALUATION|>\nEvaluate the following response for the given prompt.",
            }
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(
    text=[text],
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

output = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)[0]

print(output)

Image + Text Evaluation Example

from PIL import Image
import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"
subfolder = "DynEval-2B"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder=subfolder,
    dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(repo_id, subfolder=subfolder)

image = Image.open("example.jpg").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {
                "type": "text",
                "text": "<|IQA|>\nEvaluate or answer the question for this image.",
            },
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(
    text=[text],
    images=[image],
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

output = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)[0]

print(output)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support