VideoSearch-R1 ActivityNet Stage 1

This is the Stage 1 VideoSearch-R1 checkpoint trained for ActivityNet.

Stage 1 is trained from the Qwen3-VL base model on constructed video retrieval and temporal grounding supervision. It is intended as the initialization for Stage 2 training or as a released checkpoint for ablations.

Use with the VideoSearch-R1 codebase:

bash scripts/data_construct/download_preextracted_data.bash activitynet
EVAL_GPUS=0 bash scripts/inference/inference.bash activitynet --checkpoint VideoSearchR1/activitynet-stage1
Downloads last month
12
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VideoSearchR1/activitynet-stage1

Finetuned
(321)
this model