jaketae commited on
Commit
6458346
1 Parent(s): a811816

docs: add findings and future work section

Browse files
Files changed (1) hide show
  1. intro.md +14 -0
intro.md CHANGED
@@ -25,6 +25,20 @@ We present three demos, which each illustrate different use cases of KoCLIP.
25
  * *Text to * Image*: This is essentially an image retrieval task. Given a text, the model looks up a database of pre-computed image embeddings to retrive the image that best matches given text.
26
  * *Text to Patch*: This is also a variant of zero-shot image classification. Given a text and an image, the image is partitioned into subsections, and the model ranks them based on their relevance with the text query.
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
29
 
30
  We thank the teams at Hugging Face and Google for arranging this wonderful oportunity. It has been a busy yet enormously rewarding week for all of us. Hope you enjoy the demo!
25
  * *Text to * Image*: This is essentially an image retrieval task. Given a text, the model looks up a database of pre-computed image embeddings to retrive the image that best matches given text.
26
  * *Text to Patch*: This is also a variant of zero-shot image classification. Given a text and an image, the image is partitioned into subsections, and the model ranks them based on their relevance with the text query.
27
 
28
+ ## Prompting
29
+
30
+ We found that KoCLIP performs better when prompting is used to induce zero-shot behavior. Namely, instead of feeding it a single word or short phrase, casting a template such as
31
+
32
+ ```
33
+ 이것은 {{}} 이다 (This is {{}}.)
34
+ ```
35
+
36
+ noticably helped the model. We hypothesize that this is due to the nature of captions in the MSCOCO datset, which are most often full sentences, albeit sometimes short in length.
37
+
38
+ ## Future Work
39
+
40
+ Due to time and resource contraints, we have yet to compare KoCLIP to other open-source baselines, such as [M-CLIP](https://huggingface.co/M-CLIP). We hope to benchmark KoCLIP on various metrics and evaluation datasets to further determine its performance and reliability. In addition, given that prompting is somewhat of a mysterious trick and an active area of ongoing research, we hope to explore ways to take a more scientific approach on prompt engineering.
41
+
42
  ---
43
 
44
  We thank the teams at Hugging Face and Google for arranging this wonderful oportunity. It has been a busy yet enormously rewarding week for all of us. Hope you enjoy the demo!