Safetensors
qwen2_vl

Logo CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval

Yifan Xu, Xinhao Li, Yichun Yang, Desen Meng, Rui Huang, Limin Wang

πŸ€— Model    |    πŸ€— Data   ο½œ    πŸ“‘ Paper   

πŸ“ Introduction

This is CaRe trained after Stage-I. It can only handle video captioning tasks. Refer to our paper for details.

Usage

Loading from the huggingface remote path is not tested. It is recommended to download this checkpoint to your local environment to prevent potential bugs.

For Captioning Tasks

from utils.video import read_frames_decord
from models.modeling_captioners import AutoCaptioner

captioner = AutoCaptioner.from_pretrained('path/to/checkpoints/CaRe-7B-Stage-1')
frames = read_frames_decord(video_path='assets/demo.mp4', num_frames=32)
description = captioner.describe(frames.unsqueeze(0))
print(description[0])
Downloads last month
2
Safetensors
Model size
8.29B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including MCG-NJU/CaRe-7B-Stage-1