LinWeizheDragon commited on
Commit
05c02b4
1 Parent(s): 23579dd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ language:
5
+ - en
6
+ tags:
7
+ - retrieval
8
+ - multi-modal
9
+ - knowledge-based visual question answering
10
+ - FLMR
11
+ - PreFLMR
12
+ ---
13
+
14
+ # FLMR model card
15
+
16
+ FLMR is an open-source model for multimodal knowledge retrieval. It is a transformer-based model that uses a combination of text and image inputs to retrieve relevant documents from a large corpus.
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ - **Model type:** FLMRModelForRetrieval
23
+ - **Language(s) (NLP):** English
24
+ - **License:** MIT License
25
+
26
+ ### Paper and resources for more detail
27
+
28
+ - **Blog Post for quick overview:** https://www.jinghong-chen.net/fined-grained-late-interaction-multimodal-retrieval-flmr/
29
+ - **Paper:** https://openreview.net/forum?id=IWWWulAX7g
30
+ - **Repository:** https://github.com/LinWeizheDragon/FLMR
31
+
32
+ ## Uses
33
+
34
+ ### Direct Use
35
+
36
+ This model can be used directly to retrieve documents from a large corpus using a combination of text and image input queries. The retrieval usage can be found in the [official implementation](https://github.com/LinWeizheDragon/FLMR).
37
+
38
+ ### Downstream Use
39
+
40
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
41
+
42
+ This model can be used combined with language models to create a retrieval-augmented language model. The use for Knowledge-based VQA can be found in [RAVQA](https://github.com/linweizhedragon/retrieval-augmented-visual-question-answering)
43
+
44
+ ## How to Get Started with the Model
45
+
46
+ For details of training, indexing, and performing retrieval, please refer to [here](https://github.com/LinWeizheDragon/FLMR).
47
+
48
+ ## Training datasets
49
+ The model is pre-trained on
50
+ 1. Image to Text retrieval: WIT
51
+ 3. Image & Question to Text retrieval: OKVQA
52
+
53
+ For details on the dataset split and conversion process, please refer to the paper [Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering](https://openreview.net/forum?id=IWWWulAX7g).
54
+
55
+ The processed datasets are:
56
+ - https://huggingface.co/datasets/BByrneLab/OKVQA_FLMR_preprocessed_data
57
+ - https://huggingface.co/datasets/BByrneLab/OKVQA_FLMR_preprocessed_GoogleSearch_passages
58
+
59
+
60
+ ## Evaluation datasets
61
+
62
+ The model is evaluated on OKVQA, Infoseek, and FVQA.
63
+
64
+ Please find the evaluation results in [the paper](https://openreview.net/forum?id=IWWWulAX7g).
65
+
66
+ ## Citation
67
+
68
+ **BibTeX:**
69
+ ```
70
+ @inproceedings{
71
+ lin2023finegrained,
72
+ title={Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering},
73
+ author={Weizhe Lin and Jinghong Chen and Jingbiao Mei and Alexandru Coca and Bill Byrne},
74
+ booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
75
+ year={2023},
76
+ url={https://openreview.net/forum?id=IWWWulAX7g}
77
+ }
78
+ ```