kpyu commited on
Commit
83865b7
1 Parent(s): fc6f6d9

Add a model card

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -1,3 +1,38 @@
1
  ---
 
2
  license: mit
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
  license: mit
4
+ tags:
5
+ - vision
6
+ - image-to-text
7
+ - video-to-text
8
+ - image-captioning
9
+ - video-captioning
10
+ - visual-question-answering
11
+ pipeline_tag: image-to-text
12
  ---
13
+
14
+ # VideoBLIP, OPT-2.7b, fine-tuned on Ego4D
15
+
16
+ VideoBLIP model, leveraging [BLIP-2](https://arxiv.org/abs/2301.12597) with [OPT-2.7b](https://huggingface.co/facebook/opt-2.7b) (a large language model with 2.7 billion parameters) as its LLM backbone.
17
+
18
+ ## Model description
19
+
20
+ VideoBLIP is an augmented BLIP-2 that can handle videos.
21
+
22
+ ## Bias, Risks, Limitations, and Ethical Considerations
23
+
24
+ VideoBLIP-OPT uses off-the-shelf OPT as the language model. It inherits the same risks and limitations as mentioned in Meta's model card.
25
+
26
+ > Like other large language models for which the diversity (or lack thereof) of training
27
+ > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
28
+ > of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
29
+ > hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
30
+ > large language models.
31
+ >
32
+
33
+ VideoBLIP has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within.
34
+
35
+
36
+ ### How to use
37
+
38
+ For code examples, please refer to the [official repository](https://github.com/yukw777/VideoBLIP).