yanka9 commited on
Commit
b26a3ba
1 Parent(s): 8ab9e82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ widget:
16
 
17
  ### Summary
18
 
19
- Finetuning a Vision-and-Language Pre-training (VLP) model for a fashion-related downstream task, Visual Question Answering (VQA). The related model, ViLT, was proposed in [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) and incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for VLP.
20
 
21
  ### Model Description
22
 
 
16
 
17
  ### Summary
18
 
19
+ A Vision-and-Language Pre-training (VLP) model for a fashion-related downstream task, Visual Question Answering (VQA). The related model, ViLT, was proposed in [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) and incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for VLP.
20
 
21
  ### Model Description
22