shoubin commited on
Commit
1202c22
1 Parent(s): 7e8784c

udpate_readme

Browse files
Files changed (2) hide show
  1. .DS_Store +0 -0
  2. README.md +8 -112
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
README.md CHANGED
@@ -1,112 +1,8 @@
1
- # Self-Chained Image-Language Model for Video Localization and Question Answering
2
-
3
- * Authors: [Shoubin Yu](https://yui010206.github.io/), [Jaemin Cho](https://j-min.io), [Prateek Yadav](https://prateek-yadav.github.io/), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/)
4
- * [arXiv](https://arxiv.org/abs/2305.06988)
5
- <img src="./assets/teaser.png" alt="teaser image" width="800"/>
6
-
7
- <img src="./assets/model.png" alt="teaser image" width="800"/>
8
-
9
- <img src="./assets/chain.png" alt="teaser image" width="800"/>
10
-
11
-
12
- # Code structure
13
- ```bash
14
-
15
- # Data & Data Preprocessing
16
- ./sevila_data
17
-
18
- # Pretrained Checkpoints
19
- ./sevila_checkpoints
20
-
21
- # SeViLA code
22
- ./lavis/
23
-
24
- # running scripts for SeViLa localizer/answerer training/inference
25
- ./run_scripts
26
-
27
- ```
28
-
29
- # Setup
30
-
31
- ## Install Dependencies
32
-
33
- 1. (Optional) Creating conda environment
34
-
35
- ```bash
36
- conda create -n sevila python=3.8
37
- conda activate sevila
38
- ```
39
-
40
- 2. build from source
41
-
42
- ```bash
43
- pip install -e .
44
- ```
45
-
46
- ## Download Pretrained Models
47
- We pre-train SeViLA localizer on QVHighlights and hold checkpoints via [Huggingface](https://huggingface.co/Shoubin/SeViLA/resolve/main/sevila_pretrained.pth).
48
- Download checkpoints and put it under /sevila_checkpoints.
49
- The checkpoints (814.55M) contains pre-trained localizer and zero-shot answerer.
50
-
51
-
52
-
53
- # Dataset Preparation
54
- We test our model on:
55
- + [NExT-QA](https://doc-doc.github.io/docs/nextqa.html)
56
-
57
- + [STAR](https://star.csail.mit.edu/)
58
-
59
- + [How2QA](https://value-benchmark.github.io/index.html)
60
-
61
- + [TVQA](https://tvqa.cs.unc.edu/)
62
-
63
- + [VLEP](https://value-benchmark.github.io/index.html)
64
-
65
- + [QVHighlights](https://github.com/jayleicn/moment_detr)
66
-
67
- please download original data and preprocess them via our [scripts](sevila_data/) under ./sevila_data/ .
68
-
69
-
70
- # Training and Inference
71
- We provideo SeViLA training and inference script examples as following:
72
- ## 1) Localizer Pre-training
73
- ```bash
74
- sh run_scripts/sevila/pre-train/pretrain_qvh.sh
75
- ```
76
-
77
- ## 2) Localizer Self-refinement
78
-
79
- ```bash
80
- sh run_scripts/sevila/refinement/nextqa_sr.sh
81
- ```
82
-
83
- ## 3) Answerer Fine-tuning
84
-
85
- ```bash
86
- sh run_scripts/sevila/finetune/nextqa_ft.sh
87
- ```
88
-
89
- ## 4) Inference
90
-
91
- ```bash
92
- sh run_scripts/sevila/inference/nextqa_infer.sh
93
- ```
94
-
95
-
96
- # Acknowledgments
97
- We thank the developers of [LAVIS](https://github.com/salesforce/LAVIS), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [CLIP](https://github.com/openai/CLIP), [All-in-one](https://github.com/showlab/all-in-one), for their public code release.
98
-
99
-
100
- # Reference
101
- Please cite our paper if you use our models in your works:
102
-
103
-
104
- ```bibtex
105
- @misc{yu2023selfchained,
106
- title={Self-Chained Image-Language Model for Video Localization and Question Answering},
107
- author={Shoubin Yu and Jaemin Cho and Prateek Yadav and Mohit Bansal},
108
- year={2023},
109
- eprint={2305.06988},
110
- archivePrefix={arXiv},
111
- primaryClass={cs.CV}
112
- }
 
1
+ title: SeViLA Demo
2
+ emoji: ⛓️
3
+ colorFrom: blue
4
+ colorTo: purple
5
+ sdk: gradio
6
+ sdk_version: 3.19.1
7
+ app_file: app.py
8
+ pinned: false