hfl
/

vle-base-for-vqa

Inference Endpoints

Model card Files Files and versions Community

ziqingyang commited on Mar 9, 2023

Commit

778e886

•

1 Parent(s): 95c348e

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -1,3 +1,13 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
 ---
+**VLE** (**V**isual-**L**anguage **E**ncoder) is an image-text multimodal understanding model built on the pre-trained text and image encoders.
+It can be used for multimodal discriminative tasks such as visual question answering and image-text retrieval.
+Especially on the visual commonsense reasoning (VCR) task, which requires high-level language understanding and reasoning skills, VLE achieves significant improvements.
+For more details see [https://github.com/iflytek/VLE](https://github.com/iflytek/VLE).
+Online VLE demo on Visual Question Answering: [https://huggingface.co/spaces/hfl/VQA_VLE_LLM](https://huggingface.co/spaces/hfl/VQA_VLE_LLM)