buxiangzhiren commited on
Commit
40c2f03
·
verified ·
1 Parent(s): d8f0ef5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -1,3 +1,12 @@
1
- ---
2
- license: ecl-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: ecl-2.0
3
+ ---
4
+ **VD-IT model**
5
+
6
+ The is our pre-trained checkpoint for our paper [**Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation**](https://arxiv.org/abs/2403.12042).
7
+
8
+ We use a video diffusion model ([ModelScopeT2V](https://modelscope.cn/models/damo/text-to-video-synthesis/summary)) as our base model, applying prompt tuning to adapt it as a visual backbone for downstream video understanding tasks.
9
+
10
+ ### Model traning
11
+ We first pre-train our model on Ref-COCO and then fine-tune it on Ref-YouTube-VOS. The training of the models utilizes
12
+ two NVIDIA A100 GPUs, processing 5 frames per clip over the course of 9 epochs. The initial learning rate is set to 5e-5 and reduced by a factor of 10 at the 6th and 8th epochs.