jylins
commited on
Commit
•
363a181
1
Parent(s):
048fdcf
add vt_clip
Browse files- README.md +6 -0
- vt_clip.pth +3 -0
README.md
CHANGED
@@ -20,6 +20,12 @@ size_categories:
|
|
20 |
**Model type:**
|
21 |
VTSUM-BLIP is an end-to-end cross-modal video summarization model.
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
**Paper or resources for more information:**
|
24 |
https://videoxum.github.io/
|
25 |
|
|
|
20 |
**Model type:**
|
21 |
VTSUM-BLIP is an end-to-end cross-modal video summarization model.
|
22 |
|
23 |
+
**Model description:**
|
24 |
+
- VTSUM-BLIP + Temporal Transformer (TT): vtsum_tt.pth
|
25 |
+
- VTSUM-BLIP + Temporal Transformer (TT) + Context Aggregation (CA): vtsum_tt_ca.pth
|
26 |
+
- VT-CLIP for VT-CLIPScore metric: vt_clip.pth
|
27 |
+
- BLIP w/ ViT-B and CapFilt-L ([Download](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth)): model_base_capfilt_large.pth
|
28 |
+
|
29 |
**Paper or resources for more information:**
|
30 |
https://videoxum.github.io/
|
31 |
|
vt_clip.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e17526e15b2f663b1d5a6c4451350ae5b9903d4c7b8dcbc58bac1910d6fe15a4
|
3 |
+
size 1795741805
|