jylins commited on
Commit
048fdcf
1 Parent(s): c5deef6

Initial Commit

Browse files
Files changed (3) hide show
  1. README.md +28 -0
  2. vtsum_tt.pth +3 -0
  3. vtsum_tt_ca.pth +3 -0
README.md CHANGED
@@ -1,3 +1,31 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ task_categories:
4
+ - summarization
5
+ language:
6
+ - en
7
+ tags:
8
+ - cross-modal-video-summarization
9
+ - video-summarization
10
+ - video-captioning
11
+ pretty_name: VideoXum
12
+ size_categories:
13
+ - 10K<n<100K
14
  ---
15
+
16
+ # VTSUM-BLIP Model Card
17
+
18
+ ## Model details
19
+
20
+ **Model type:**
21
+ VTSUM-BLIP is an end-to-end cross-modal video summarization model.
22
+
23
+ **Paper or resources for more information:**
24
+ https://videoxum.github.io/
25
+
26
+ ## Training dataset
27
+ - VideoXum *training* set: 8K long videos long videos with 80K pairs of aligned video and text summaries.
28
+
29
+ ## Evaluation dataset
30
+ - VideoXum *val* set: 2K long videos long videos with 80K pairs of aligned video and text summaries.
31
+ - VideoXum *test* set: 4K long videos long videos with 80K pairs of aligned video and text summaries.
vtsum_tt.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8a39b35dfe57f6fe0af666d314a699d5d49f9b82d8a8808ba95f0a32cdeeebd
3
+ size 581582010
vtsum_tt_ca.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57451c4712afcee8d2f5c817c8c7399c2f03e0394d884e409afe9699b2896cdb
3
+ size 591040593