Searchium-ai
/

clip4clip-webvid150k

@@ -20,13 +20,6 @@ For training purposes, a subset consisting of the first 150,000 video-text pairs
 This HF model is based on the [clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) architecture, with weights trained by Daphna Idelson at [Searchium](https://www.searchium.ai).
-## Motivation
-As per the original authors, the main motivation behind this work is to leverage the power of the CLIP image-language pre-training model and apply it to learning
-visual-temporal concepts from videos, thereby improving video-based searches.
-By using the WebVid dataset, the model's capabilities were enhanced even beyond those described in the paper, thanks to the large-scale and diverse nature of the dataset empowering the model's performance.
 # How to use
 ### Extracting Text Embeddings:
@@ -55,6 +48,7 @@ print("sequence_output: ", sequence_output)
 An additional [notebook](https://huggingface.co/Diangle/clip4clip-webvid/blob/main/Notebooks/GSI_VideoRetrieval_VideoEmbedding.ipynb) is available that provides instructions on how to perform video embedding.
 ## Model Intended Use
 This model is intended for use in large scale video-text retrieval applications.
@@ -62,6 +56,13 @@ This model is intended for use in large scale video-text retrieval applications.
 To illustrate its functionality, refer to the accompanying [**Video Search Space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) which provides a search demonstration on a vast collection of approximately 1.5 million videos.
 This interactive demo showcases the model's capability to effectively retrieve videos based on text queries, highlighting its potential for handling substantial video datasets.
 ## Evaluations
@@ -93,6 +94,7 @@ For an elaborate description of the evaluation refer to the notebook
 ## Acknowledgements
 Acknowledging Diana Mazenko of [Searchium](https://www.searchium.ai) for adapting and loading the model to Hugging Face, and for creating a Hugging Face [**SPACE**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) for a large-scale video-search demo.
 Acknowledgments also to Lou et el for their comprehensive work on CLIP4Clip and openly available code.
 ## Citations

 This HF model is based on the [clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) architecture, with weights trained by Daphna Idelson at [Searchium](https://www.searchium.ai).
 # How to use
 ### Extracting Text Embeddings:
 An additional [notebook](https://huggingface.co/Diangle/clip4clip-webvid/blob/main/Notebooks/GSI_VideoRetrieval_VideoEmbedding.ipynb) is available that provides instructions on how to perform video embedding.
 ## Model Intended Use
 This model is intended for use in large scale video-text retrieval applications.
 To illustrate its functionality, refer to the accompanying [**Video Search Space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) which provides a search demonstration on a vast collection of approximately 1.5 million videos.
 This interactive demo showcases the model's capability to effectively retrieve videos based on text queries, highlighting its potential for handling substantial video datasets.
+## Motivation
+As per the original authors, the main motivation behind this work is to leverage the power of the CLIP image-language pre-training model and apply it to learning
+visual-temporal concepts from videos, thereby improving video-based searches.
+By using the WebVid dataset, the model's capabilities were enhanced even beyond those described in the paper, thanks to the large-scale and diverse nature of the dataset empowering the model's performance.
 ## Evaluations
 ## Acknowledgements
 Acknowledging Diana Mazenko of [Searchium](https://www.searchium.ai) for adapting and loading the model to Hugging Face, and for creating a Hugging Face [**SPACE**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) for a large-scale video-search demo.
 Acknowledgments also to Lou et el for their comprehensive work on CLIP4Clip and openly available code.
 ## Citations