jylins commited on
Commit
363a181
1 Parent(s): 048fdcf

add vt_clip

Browse files
Files changed (2) hide show
  1. README.md +6 -0
  2. vt_clip.pth +3 -0
README.md CHANGED
@@ -20,6 +20,12 @@ size_categories:
20
  **Model type:**
21
  VTSUM-BLIP is an end-to-end cross-modal video summarization model.
22
 
 
 
 
 
 
 
23
  **Paper or resources for more information:**
24
  https://videoxum.github.io/
25
 
 
20
  **Model type:**
21
  VTSUM-BLIP is an end-to-end cross-modal video summarization model.
22
 
23
+ **Model description:**
24
+ - VTSUM-BLIP + Temporal Transformer (TT): vtsum_tt.pth
25
+ - VTSUM-BLIP + Temporal Transformer (TT) + Context Aggregation (CA): vtsum_tt_ca.pth
26
+ - VT-CLIP for VT-CLIPScore metric: vt_clip.pth
27
+ - BLIP w/ ViT-B and CapFilt-L ([Download](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth)): model_base_capfilt_large.pth
28
+
29
  **Paper or resources for more information:**
30
  https://videoxum.github.io/
31
 
vt_clip.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e17526e15b2f663b1d5a6c4451350ae5b9903d4c7b8dcbc58bac1910d6fe15a4
3
+ size 1795741805