Update README.md
Browse files
README.md
CHANGED
@@ -20,13 +20,6 @@ For training purposes, a subset consisting of the first 150,000 video-text pairs
|
|
20 |
|
21 |
This HF model is based on the [clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) architecture, with weights trained by Daphna Idelson at [Searchium](https://www.searchium.ai).
|
22 |
|
23 |
-
## Motivation
|
24 |
-
|
25 |
-
As per the original authors, the main motivation behind this work is to leverage the power of the CLIP image-language pre-training model and apply it to learning
|
26 |
-
visual-temporal concepts from videos, thereby improving video-based searches.
|
27 |
-
|
28 |
-
By using the WebVid dataset, the model's capabilities were enhanced even beyond those described in the paper, thanks to the large-scale and diverse nature of the dataset empowering the model's performance.
|
29 |
-
|
30 |
|
31 |
# How to use
|
32 |
### Extracting Text Embeddings:
|
@@ -55,6 +48,7 @@ print("sequence_output: ", sequence_output)
|
|
55 |
|
56 |
An additional [notebook](https://huggingface.co/Diangle/clip4clip-webvid/blob/main/Notebooks/GSI_VideoRetrieval_VideoEmbedding.ipynb) is available that provides instructions on how to perform video embedding.
|
57 |
|
|
|
58 |
## Model Intended Use
|
59 |
|
60 |
This model is intended for use in large scale video-text retrieval applications.
|
@@ -62,6 +56,13 @@ This model is intended for use in large scale video-text retrieval applications.
|
|
62 |
To illustrate its functionality, refer to the accompanying [**Video Search Space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) which provides a search demonstration on a vast collection of approximately 1.5 million videos.
|
63 |
This interactive demo showcases the model's capability to effectively retrieve videos based on text queries, highlighting its potential for handling substantial video datasets.
|
64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
## Evaluations
|
67 |
|
@@ -93,6 +94,7 @@ For an elaborate description of the evaluation refer to the notebook
|
|
93 |
|
94 |
## Acknowledgements
|
95 |
Acknowledging Diana Mazenko of [Searchium](https://www.searchium.ai) for adapting and loading the model to Hugging Face, and for creating a Hugging Face [**SPACE**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) for a large-scale video-search demo.
|
|
|
96 |
Acknowledgments also to Lou et el for their comprehensive work on CLIP4Clip and openly available code.
|
97 |
|
98 |
## Citations
|
|
|
20 |
|
21 |
This HF model is based on the [clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) architecture, with weights trained by Daphna Idelson at [Searchium](https://www.searchium.ai).
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
# How to use
|
25 |
### Extracting Text Embeddings:
|
|
|
48 |
|
49 |
An additional [notebook](https://huggingface.co/Diangle/clip4clip-webvid/blob/main/Notebooks/GSI_VideoRetrieval_VideoEmbedding.ipynb) is available that provides instructions on how to perform video embedding.
|
50 |
|
51 |
+
|
52 |
## Model Intended Use
|
53 |
|
54 |
This model is intended for use in large scale video-text retrieval applications.
|
|
|
56 |
To illustrate its functionality, refer to the accompanying [**Video Search Space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) which provides a search demonstration on a vast collection of approximately 1.5 million videos.
|
57 |
This interactive demo showcases the model's capability to effectively retrieve videos based on text queries, highlighting its potential for handling substantial video datasets.
|
58 |
|
59 |
+
## Motivation
|
60 |
+
|
61 |
+
As per the original authors, the main motivation behind this work is to leverage the power of the CLIP image-language pre-training model and apply it to learning
|
62 |
+
visual-temporal concepts from videos, thereby improving video-based searches.
|
63 |
+
|
64 |
+
By using the WebVid dataset, the model's capabilities were enhanced even beyond those described in the paper, thanks to the large-scale and diverse nature of the dataset empowering the model's performance.
|
65 |
+
|
66 |
|
67 |
## Evaluations
|
68 |
|
|
|
94 |
|
95 |
## Acknowledgements
|
96 |
Acknowledging Diana Mazenko of [Searchium](https://www.searchium.ai) for adapting and loading the model to Hugging Face, and for creating a Hugging Face [**SPACE**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid) for a large-scale video-search demo.
|
97 |
+
|
98 |
Acknowledgments also to Lou et el for their comprehensive work on CLIP4Clip and openly available code.
|
99 |
|
100 |
## Citations
|