You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DynEval Evaluator

This repository contains the DynEval evaluator models, fine-tuned from Qwen3-VL vision-language models for multimodal evaluation workflows.

Two evaluator checkpoints are included:

  • DynEval Evaluator 2B: Qwen3-VL 2B fine-tuned checkpoint uploaded under DynEval-2B
  • DynEval Evaluator 4B: Qwen3-VL 4B fine-tuned checkpoint uploaded under DynEval-4B

Both checkpoints are saved in Hugging Face transformers format and can be loaded with Qwen3VLForConditionalGeneration.

Model Variants

Variant Architecture Checkpoint Precision Training Epochs Global Step Last Logged Loss
DynEval Evaluator 2B Qwen3VLForConditionalGeneration 315 bfloat16 3.0 315 0.5989
DynEval Evaluator 4B Qwen3VLForConditionalGeneration 471 bfloat16 3.0 471 0.5784

Model Details

DynEval Evaluator 2B

  • Model type: qwen3_vl
  • Tokenizer: Qwen2Tokenizer
  • Tokenizer max length: 65,536
  • Text context config: 262,144 max position embeddings
  • Text hidden size: 2,048
  • Text layers: 28
  • Attention heads: 16
  • KV heads: 8
  • Vision encoder depth: 24
  • Vision hidden size: 1,024
  • Vision patch size: 16

DynEval Evaluator 4B

  • Model type: qwen3_vl
  • Tokenizer: Qwen2Tokenizer
  • Tokenizer max length: 65,536
  • Text context config: 262,144 max position embeddings
  • Text hidden size: 2,560
  • Text layers: 36
  • Attention heads: 32
  • KV heads: 8
  • Vision encoder depth: 24
  • Vision hidden size: 1,024
  • Vision patch size: 16

Special Tokens

The evaluator checkpoints include the following task tokens:

<|T2IA|>
<|IQA|>
<|EVALUATION|>

Use the task token that matches your evaluation setting.

Intended Use

DynEval Evaluator is intended for research use in multimodal evaluation, especially for evaluating text-to-image and image-question answering style outputs.

Example use cases:

  • text-to-image alignment evaluation
  • image-question answering evaluation
  • multimodal response scoring
  • visual reasoning evaluation
  • evaluator-based comparison of image generation model outputs

These models should be used with the same prompt format and task tokens used during fine-tuning.

Quick Start

Install recent versions of transformers, torch, and related image dependencies.

pip install torch transformers accelerate pillow

Load the 2B Evaluator

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder="DynEval-2B",
    dtype=torch.bfloat16,
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(
    repo_id,
    subfolder="DynEval-2B",
)

Load the 4B Evaluator

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder="DynEval-4B",
    dtype=torch.bfloat16,
    device_map="auto",
)

processor = AutoProcessor.from_pretrained(
    repo_id,
    subfolder="DynEval-4B",
)

Text-Only Evaluation Example

import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"
subfolder = "DynEval-2B"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder=subfolder,
    dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(repo_id, subfolder=subfolder)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "<|EVALUATION|>\nEvaluate the following response for the given prompt.",
            }
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(
    text=[text],
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

output = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)[0]

print(output)

Image + Text Evaluation Example

from PIL import Image
import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

repo_id = "vcl-iisc/DynEval-Evaluator"
subfolder = "DynEval-2B"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    repo_id,
    subfolder=subfolder,
    dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(repo_id, subfolder=subfolder)

image = Image.open("example.jpg").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {
                "type": "text",
                "text": "<|IQA|>\nEvaluate or answer the question for this image.",
            },
        ],
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = processor(
    text=[text],
    images=[image],
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
    )

output = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)[0]

print(output)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support