jylins commited on
Commit
f7b4925
β€’
1 Parent(s): 363a181

update readme

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -26,9 +26,24 @@ VTSUM-BLIP is an end-to-end cross-modal video summarization model.
26
  - VT-CLIP for VT-CLIPScore metric: vt_clip.pth
27
  - BLIP w/ ViT-B and CapFilt-L ([Download](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth)): model_base_capfilt_large.pth
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  **Paper or resources for more information:**
30
  https://videoxum.github.io/
31
 
 
 
32
  ## Training dataset
33
  - VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
34
 
 
26
  - VT-CLIP for VT-CLIPScore metric: vt_clip.pth
27
  - BLIP w/ ViT-B and CapFilt-L ([Download](https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth)): model_base_capfilt_large.pth
28
 
29
+ **The file structure of Model zoo looks like:**
30
+ ```
31
+ outputs
32
+ β”œβ”€β”€ blip
33
+ β”‚ └── model_base_capfilt_large.pth
34
+ β”œβ”€β”€ vt_clipscore
35
+ β”‚ └── vt_clip.pth
36
+ β”œβ”€β”€ vtsum_tt
37
+ β”‚ └── vtsum_tt.pth
38
+ └── vtsum_tt_ca
39
+ └── vtsum_tt_ca.pth
40
+ ```
41
+
42
  **Paper or resources for more information:**
43
  https://videoxum.github.io/
44
 
45
+
46
+
47
  ## Training dataset
48
  - VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
49