Spaces:

altndrr
/

vic

Runtime error

altndrr commited on Jun 5, 2023

Commit

54a3362

•

1 Parent(s): a0fbf80

Update thumbnail and description

Files changed (2) hide show

app.py CHANGED Viewed

@@ -7,8 +7,6 @@ from src.nn import CaSED
 PAPER_TITLE = "Vocabulary-free Image Classification"
 PAPER_DESCRIPTION = """
 <div style="display: flex; align-items: center; justify-content: center; margin-bottom: 1rem;">
     <a href="https://github.com/altndrr/vic" style="margin-right: 0.5rem;">
         <img src="https://img.shields.io/badge/code-github.altndrr%2Fvic-blue.svg"/>
@@ -28,11 +26,14 @@ PAPER_DESCRIPTION = """
 Vocabulary-free Image Classification aims to assign a class to an image *without* prior knowledge
 on the list of class names, thus operating on the semantic class space that contains all the
 possible concepts. Our proposed method CaSED finds the best matching category within the
-unconstrained semantic space by multimodal data from large vision-language databases. We first
-retrieve the semantically most similar captions from a database, from which we extract a set of
-candidate categories by applying text parsing and filtering techniques. We further score the
-candidates using the multimodal aligned representation of the large pre-trained VLM, *i.e.* CLIP,
-to obtain the best-matching category.
 """
 PAPER_URL = "https://arxiv.org/abs/2306.00917"
@@ -81,7 +82,7 @@ demo = gr.Interface(
     examples="./artifacts/examples/",
     allow_flagging='never',
     theme=gr.themes.Soft(),
-    thumbnail="./assets/thumbnail.png",
 )
 demo.launch(share=False)

 PAPER_TITLE = "Vocabulary-free Image Classification"
 PAPER_DESCRIPTION = """
 <div style="display: flex; align-items: center; justify-content: center; margin-bottom: 1rem;">
     <a href="https://github.com/altndrr/vic" style="margin-right: 0.5rem;">
         <img src="https://img.shields.io/badge/code-github.altndrr%2Fvic-blue.svg"/>
 Vocabulary-free Image Classification aims to assign a class to an image *without* prior knowledge
 on the list of class names, thus operating on the semantic class space that contains all the
 possible concepts. Our proposed method CaSED finds the best matching category within the
+unconstrained semantic space by multimodal data from large vision-language databases.
+To assign a label to an image, we:
+1. extract the image features using a pre-trained Vision-Language Model (VLM);
+2. retrieve the semantically most similar captions from a textual database;
+3. extract from the captions a set of candidate categories by applying text parsing and filtering;
+4. score the candidates using the multimodal aligned representation of the pre-trained VLM to
+    obtain the best-matching category.
 """
 PAPER_URL = "https://arxiv.org/abs/2306.00917"
     examples="./artifacts/examples/",
     allow_flagging='never',
     theme=gr.themes.Soft(),
+    thumbnail="https://altndrr.github.io/vic/assets/images/method.png",
 )
 demo.launch(share=False)

assets/thumbnail.png DELETED Viewed

Binary file (785 kB)