jylins
commited on
Commit
β’
f7b4925
1
Parent(s):
363a181
update readme
Browse files
README.md
CHANGED
@@ -26,9 +26,24 @@ VTSUM-BLIP is an end-to-end cross-modal video summarization model.
|
|
26 |
- VT-CLIP for VT-CLIPScore metric: vt_clip.pth
|
27 |
- BLIP w/ ViT-B and CapFilt-L ([Download](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth)): model_base_capfilt_large.pth
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
**Paper or resources for more information:**
|
30 |
https://videoxum.github.io/
|
31 |
|
|
|
|
|
32 |
## Training dataset
|
33 |
- VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
|
34 |
|
|
|
26 |
- VT-CLIP for VT-CLIPScore metric: vt_clip.pth
|
27 |
- BLIP w/ ViT-B and CapFilt-L ([Download](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth)): model_base_capfilt_large.pth
|
28 |
|
29 |
+
**The file structure of Model zoo looks like:**
|
30 |
+
```
|
31 |
+
outputs
|
32 |
+
βββ blip
|
33 |
+
β βββ model_base_capfilt_large.pth
|
34 |
+
βββ vt_clipscore
|
35 |
+
β βββ vt_clip.pth
|
36 |
+
βββ vtsum_tt
|
37 |
+
β βββ vtsum_tt.pth
|
38 |
+
βββ vtsum_tt_ca
|
39 |
+
βββ vtsum_tt_ca.pth
|
40 |
+
```
|
41 |
+
|
42 |
**Paper or resources for more information:**
|
43 |
https://videoxum.github.io/
|
44 |
|
45 |
+
|
46 |
+
|
47 |
## Training dataset
|
48 |
- VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
|
49 |
|