chandrakalagowda commited on
Commit
570c07f
1 Parent(s): a2239ab

Upload folder using huggingface_hub

Browse files
1_reverse_video_search_engine.py ADDED
@@ -0,0 +1,480 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ # coding: utf-8
3
+
4
+ # # How to Build a Reverse Video Search Engine
5
+ #
6
+ # This notebook illustrates how to build a reverse-video-search engine from scratch using [Milvus](https://milvus.io/) and [Towhee](https://towhee.io/).
7
+ #
8
+ # **What is Reverse Video Search?**
9
+ #
10
+ # Reverse video search is similar like [reverse image search](https://en.wikipedia.org/wiki/Reverse_image_search). In simple words, it takes a video as input to search for similar videos. As we know that video-related tasks are harder to tackle, video models normally do not achieve as high scores as other types of models. However, there are increasing demands in AI applications in video. Reverse video search can effectively discover related videos and improve other applications.
11
+ #
12
+ #
13
+ # **What are Milvus & Towhee?**
14
+ #
15
+ # - Milvus is the most advanced open-source vector database built for AI applications and supports nearest neighbor embedding search across tens of millions of entries.
16
+ # - Towhee is a framework that provides ETL for unstructured data using SoTA machine learning models.
17
+ #
18
+ # We will go through the procedure of building a reverse-video-search engine and evaluate its performance.
19
+
20
+ # ## Preparation
21
+ #
22
+ # ### Install packages
23
+ #
24
+ # Make sure you have installed required python packages:
25
+ #
26
+ # | package |
27
+ # | -- |
28
+ # | towhee |
29
+ # | towhee.models |
30
+ # | pillow |
31
+ # | ipython |
32
+ # | gradio |
33
+
34
+ # In[1]:
35
+
36
+
37
+ #! python -m pip install -q towhee towhee.models pillow ipython gradio
38
+
39
+
40
+ # ### Prepare data
41
+ #
42
+ # This tutorial will use a small data extracted from [Kinetics400](https://www.deepmind.com/open-source/kinetics). You can download the subset from [Github](https://github.com/towhee-io/examples/releases/download/data/reverse_video_search.zip).
43
+ #
44
+ # The data is organized as follows:
45
+ # - **train:** candidate videos, 20 classes, 10 videos per class (200 in total)
46
+ # - **test:** query videos, same 20 classes as train data, 1 video per class (20 in total)
47
+ # - **reverse_video_search.csv:** a csv file containing an ***id***, ***path***, and ***label*** for each video in train data
48
+ #
49
+ # Let's take a quick look:
50
+
51
+ # In[1]:
52
+
53
+ import time
54
+ from zipfile import ZipFile
55
+
56
+ with ZipFile('reverse_video_search.zip', 'r') as zip:
57
+ # printing all the contents of the zip file
58
+ # extracting all the files
59
+ print('Extracting all the files now...')
60
+ zip.extractall()
61
+ print('Done!')
62
+
63
+
64
+ # In[2]:
65
+
66
+
67
+ import pandas as pd
68
+ import time
69
+
70
+ df = pd.read_csv('./reverse_video_search.csv')
71
+ df.head(3)
72
+
73
+
74
+ # For later steps to easier get videos & measure results, we build some helpful functions in advance:
75
+ # - **ground_truth:** get ground-truth video ids for the query video by its path
76
+
77
+ # In[3]:
78
+
79
+
80
+ id_video = df.set_index('id')['path'].to_dict()
81
+ label_ids = {}
82
+ for label in set(df['label']):
83
+ label_ids[label] = list(df[df['label']==label].id)
84
+
85
+
86
+ def ground_truth(path):
87
+ label = path.split('/')[-2]
88
+ return label_ids[label]
89
+
90
+
91
+ # ### Start Milvus
92
+ #
93
+ # Before getting started with the engine, we also need to get ready with Milvus. Please make sure that you have started a [Milvus service](https://milvus.io/docs/install_standalone-docker.md). This notebook uses [milvus 2.2.10](https://milvus.io/docs/v2.2.x/install_standalone-docker.md) and [pymilvus 2.2.11](https://milvus.io/docs/release_notes.md#2210).
94
+
95
+ # In[ ]:
96
+
97
+
98
+ #! python -m pip install -q pymilvus==2.2.11
99
+
100
+
101
+ # Here we prepare a function to work with a Milvus collection with the following parameters:
102
+ # - [L2 distance metric](https://milvus.io/docs/metric.md#Euclidean-distance-L2)
103
+ # - [IVF_FLAT index](https://milvus.io/docs/index.md#IVF_FLAT).
104
+
105
+ # In[4]:
106
+
107
+
108
+ from milvus import default_server
109
+ from pymilvus import connections, utility
110
+ default_server.start()
111
+
112
+
113
+ # In[5]:
114
+
115
+
116
+ connections.connect(host='127.0.0.1', port=default_server.listen_port)
117
+
118
+
119
+ # In[6]:
120
+
121
+
122
+ default_server.listen_port
123
+
124
+
125
+ # In[7]:
126
+
127
+
128
+ print(utility.get_server_version())
129
+
130
+
131
+ # In[10]:
132
+
133
+
134
+ from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
135
+
136
+ #connections.connect(host='localhost', port='19530')
137
+ connections.connect(host='127.0.0.1', port='19530')
138
+
139
+ def create_milvus_collection(collection_name, dim):
140
+
141
+ if utility.has_collection(collection_name):
142
+ utility.drop_collection(collection_name)
143
+
144
+ fields = [
145
+ FieldSchema(name='id', dtype=DataType.INT64, descrition='ids', is_primary=True, auto_id=False),
146
+ FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, descrition='embedding vectors', dim=dim)
147
+ ]
148
+ schema = CollectionSchema(fields=fields, description='reverse video search')
149
+ collection = Collection(name=collection_name, schema=schema)
150
+
151
+ # create IVF_FLAT index for collection.
152
+ index_params = {
153
+ 'metric_type':'L2',
154
+ 'index_type':"IVF_FLAT",
155
+ 'params':{"nlist": 400}
156
+ }
157
+ collection.create_index(field_name="embedding", index_params=index_params)
158
+ return collection
159
+
160
+ collection = create_milvus_collection('x3d_m', 2048)
161
+
162
+
163
+ # In[11]:
164
+
165
+
166
+ time.sleep(10)
167
+
168
+
169
+ # ## Build Engine
170
+ #
171
+ # Now we are ready to build a reverse-video-search engine. The basic idea behind reverse video search is to represent each video with an embedding and then perform similarity search by comparing vector distances.
172
+ #
173
+ # As mentioned at the beginning, we use deep learning networks provided by Towhee to extract features and generate embeddings. Milvus is used for vector storage and similarity search.
174
+ #
175
+ # <img src='reverse_video_search.png' alt='reverse_video_search_engine' width=700px/>
176
+
177
+ # ### Load Video Embeddings into Milvus
178
+ #
179
+ # We first generate embeddings for videos with [X3D model](https://arxiv.org/abs/2004.04730) and then insert video embeddings into Milvus. Towhee provides a [method-chaining style API](https://towhee.readthedocs.io/en/main/index.html) so that users can assemble a data processing pipeline with operators.
180
+
181
+ # In[12]:
182
+
183
+
184
+ from towhee import pipe, ops
185
+ from towhee.datacollection import DataCollection
186
+
187
+ def read_csv(csv_file):
188
+ import csv
189
+ with open(csv_file, 'r', encoding='utf-8-sig') as f:
190
+ data = csv.DictReader(f)
191
+ for line in data:
192
+ yield line['id'], line['path'], line['label']
193
+
194
+
195
+ insert_pipe = (
196
+ pipe.input('csv_path')
197
+ .flat_map('csv_path', ('id', 'path', 'label'), read_csv)
198
+ .map('id', 'id', lambda x: int(x))
199
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))
200
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='x3d_m', skip_preprocess=True))
201
+ .map(('id', 'features'), 'insert_res', ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='x3d_m'))
202
+ .output()
203
+ )
204
+
205
+ insert_pipe('reverse_video_search.csv')
206
+ print('Total number of inserted data is {}.'.format(collection.num_entities))
207
+
208
+
209
+ # In[13]:
210
+
211
+
212
+ print('Total number of inserted data is {}.'.format(collection.num_entities))
213
+
214
+
215
+ # #### Pipeline Explanation
216
+ #
217
+ # Here are some details for each line of the assemble pipeline:
218
+ #
219
+ # - `flat_map('csv_path', ('id', 'path', 'label'), read_csv)`: read tabular data from csv file
220
+ #
221
+ # - `map('id', 'id', lambda x: int(x))`: for each row from the data, convert data type of the column id to int
222
+ #
223
+ # - `map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))`: an embeded Towhee operator reading video as frames with specified sample method and number of samples. [learn more](https://towhee.io/video-decode/ffmpeg)
224
+ #
225
+ # - `map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='x3d_m', skip_preprocess=True))`: an embeded Towhee operator applying specified model to video frames, which can be used to generate video embedding. [learn more](https://towhee.io/action-classification/pytorchvideo)
226
+ #
227
+ # - `map(('id', 'features'), 'insert_res', ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='x3d_m'))`: insert video embedding into Milvus collection
228
+
229
+ # ### Query Similar Videos from Milvus
230
+ #
231
+ # Now all embeddings of candidate videos have been inserted into Milvus collection, we can query embeddings across the collection for nearest neighbors.
232
+ #
233
+ # To get query embeddings, we should go through same pre-insert steps for each input video. Because Milvus returns video ids and vector distances, we use the `id_video` dictionary to get corresponding video paths based on ids.
234
+
235
+ # In[7]:
236
+
237
+
238
+ time.sleep(60)
239
+
240
+
241
+ # In[14]:
242
+
243
+
244
+ collection.load()
245
+ time.sleep(60)
246
+ query_path = './test/eating_carrots/ty4UQlowp0c.mp4'
247
+
248
+ query_pipe = (
249
+ pipe.input('path')
250
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))
251
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='x3d_m', skip_preprocess=True))
252
+ .map('features', 'result', ops.ann_search.milvus_client(host='127.0.0.1', port='19530', collection_name='x3d_m', limit=10))
253
+ .map('result', 'candidates', lambda x: [id_video[i[0]] for i in x])
254
+ .output('path', 'candidates')
255
+ )
256
+
257
+ res = DataCollection(query_pipe(query_path))
258
+ res.show()
259
+
260
+
261
+ # To display in the notebook, we convert videos to gifs. The code below first loads each video from its path and then gets full video frames with the embeded Towhee operator `.video_decode.ffmpeg()`. Finally converted gifs are saved under the directory *tmp_dir*. The section below is just help to show a search example.
262
+
263
+ # In[15]:
264
+
265
+
266
+ import os
267
+ from IPython import display
268
+ from PIL import Image
269
+
270
+ tmp_dir = './tmp'
271
+ os.makedirs(tmp_dir, exist_ok=True)
272
+
273
+ def video_to_gif(video_path):
274
+ gif_path = os.path.join(tmp_dir, video_path.split('/')[-1][:-4] + '.gif')
275
+ p = (
276
+ pipe.input('path')
277
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))
278
+ .output('frames')
279
+ )
280
+ frames = p(video_path).get()[0]
281
+ imgs = [Image.fromarray(frame) for frame in frames]
282
+ imgs[0].save(fp=gif_path, format='GIF', append_images=imgs[1:], save_all=True, loop=0)
283
+ return gif_path
284
+
285
+ html = 'Query video "{}": <br/>'.format(query_path.split('/')[-2])
286
+ query_gif = video_to_gif(query_path)
287
+ html_line = '<img src="{}"> <br/>'.format(query_gif)
288
+ html += html_line
289
+ html += 'Top 3 search results: <br/>'
290
+
291
+ for path in res[0]['candidates'][:3]:
292
+ gif_path = video_to_gif(path)
293
+ html_line = '<img src="{}" style="display:inline;margin:1px"/>'.format(gif_path)
294
+ html += html_line
295
+ display.HTML(html)
296
+
297
+
298
+ # ### Evaluation
299
+ #
300
+ # We have just built a reverse video search engine. But how's its performance? We can evaluate the search engine against the ground truths.
301
+ #
302
+ # In this section, we'll measure the performance with 2 metrics - mHR and mAP:
303
+ #
304
+ # - **mHR (recall@K):**
305
+ # - Mean Hit Ratio describes how many actual relevant results are returned out of all ground truths.
306
+ # - Since Milvus return results with topK, we can also call this metric *recall@K*, where K is the count of searched results. When returned results are as many as ground truths, the hit ratio is equivalent to accuracy and we can take it as *accuracy@K* as well.
307
+ # - For example, there are 100 archery videos in the collection. Then querying the engine with another archery video returns 70 archery videos out of 80 results. In this case, the number of ground truths is 100 and hitted (correct) results are 70. So the hit ratio is 70/100.
308
+ #
309
+ # - **mAP:**
310
+ # - Average precision describes whether all of the relevant results are ranked higher than irrelevant results.
311
+
312
+ # In[16]:
313
+
314
+
315
+ import glob
316
+
317
+ def mean_hit_ratio(actual, predicted):
318
+ ratios = []
319
+ for act, pre in zip(actual, predicted):
320
+ hit_num = len(set(act) & set(pre))
321
+ ratios.append(hit_num / len(act))
322
+ return sum(ratios) / len(ratios)
323
+
324
+ def mean_average_precision(actual, predicted):
325
+ aps = []
326
+ for act, pre in zip(actual, predicted):
327
+ precisions = []
328
+ hit = 0
329
+ for idx, i in enumerate(pre):
330
+ if i in act:
331
+ hit += 1
332
+ precisions.append(hit / (idx + 1))
333
+ aps.append(sum(precisions) / len(precisions))
334
+
335
+ return sum(aps) / len(aps)
336
+
337
+ eval_pipe = (
338
+ pipe.input('path')
339
+ .flat_map('path', 'path', lambda x: glob.glob(x))
340
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))
341
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='x3d_m', skip_preprocess=True))
342
+ .map('features', 'result', ops.ann_search.milvus_client(host='127.0.0.1', port='19530', collection_name='x3d_m', limit=10))
343
+ .map('result', 'predict', lambda x: [i[0] for i in x])
344
+ .map('path', 'ground_truth', ground_truth)
345
+ .window_all(('ground_truth', 'predict'), 'mhr', mean_hit_ratio)
346
+ .window_all(('ground_truth', 'predict'), 'map', mean_average_precision)
347
+ .output('mhr', 'map')
348
+ )
349
+
350
+ res = DataCollection(eval_pipe('./test/*/*.mp4'))
351
+ res.show()
352
+
353
+
354
+ # ## Optimization
355
+ #
356
+ # We can see from above evaluation report, the current performance is not satisfactory. What can we do to improve the search engine? Of course we can fine-tune deep learning network with our own train data. Using more types of embeddings or filters by video tags/description/captions and audio can definitely enhance the search engine as well. But in this tutorial, I will just recommend some very simple options to make improvements.
357
+ #
358
+ # ### Normalize embeddings
359
+ #
360
+ # A quick optimization is normalizing all embeddings. Then the L2 distance will be equivalent to cosine similarity, which measures the similarity between two vectors using the angle between them, which ignores the magnitude of the vectors. We use the `ops.towhee.np_normalize` provided by Towhee to simply normalize all embeddings.
361
+
362
+ # In[17]:
363
+
364
+
365
+ collection = create_milvus_collection('x3d_m_norm', 2048)
366
+
367
+ insert_pipe = (
368
+ pipe.input('csv_path')
369
+ .flat_map('csv_path', ('id', 'path', 'label'), read_csv)
370
+ .map('id', 'id', lambda x: int(x))
371
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))
372
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='x3d_m', skip_preprocess=True))
373
+ .map('features', 'features', ops.towhee.np_normalize())
374
+ .map(('id', 'features'), 'insert_res', ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='x3d_m_norm'))
375
+ .output()
376
+ )
377
+
378
+ insert_pipe('reverse_video_search.csv')
379
+
380
+ collection.load()
381
+ eval_pipe = (
382
+ pipe.input('path')
383
+ .flat_map('path', 'path', lambda x: glob.glob(x))
384
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 16}))
385
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='x3d_m', skip_preprocess=True))
386
+ .map('features', 'features', ops.towhee.np_normalize())
387
+ .map('features', 'result', ops.ann_search.milvus_client(host='127.0.0.1', port='19530', collection_name='x3d_m_norm', limit=10))
388
+ .map('result', 'predict', lambda x: [i[0] for i in x])
389
+ .map('path', 'ground_truth', ground_truth)
390
+ .window_all(('ground_truth', 'predict'), 'mhr', mean_hit_ratio)
391
+ .window_all(('ground_truth', 'predict'), 'map', mean_average_precision)
392
+ .output('mhr', 'map')
393
+ )
394
+
395
+ res = DataCollection(eval_pipe('./test/*/*.mp4'))
396
+ res.show()
397
+
398
+
399
+ # With vector normalization, we have increased the mHR to 0.66 and mAP to about 0.74, which look better now.
400
+
401
+ # ### Change model
402
+ #
403
+ # There are more video models using different networks. Normally a more complicated or larger model will show better results while cost more. You can always try more models to tradeoff among accuracy, latency, and resource usage. Here I show the performance for the reverse video search engine using a SOTA model with [multiscale vision transformer](https://arxiv.org/abs/2104.11227) as backbone.
404
+
405
+ # In[18]:
406
+
407
+
408
+ collection = create_milvus_collection('mvit_base', 768)
409
+
410
+ insert_pipe = (
411
+ pipe.input('csv_path')
412
+ .flat_map('csv_path', ('id', 'path', 'label'), read_csv)
413
+ .map('id', 'id', lambda x: int(x))
414
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 32}))
415
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='mvit_base_32x3', skip_preprocess=True))
416
+ .map('features', 'features', ops.towhee.np_normalize())
417
+ .map(('id', 'features'), 'insert_res', ops.ann_insert.milvus_client(host='127.0.0.1', port='19530', collection_name='mvit_base'))
418
+ .output()
419
+ )
420
+
421
+ insert_pipe('reverse_video_search.csv')
422
+
423
+ collection.load()
424
+ eval_pipe = (
425
+ pipe.input('path')
426
+ .flat_map('path', 'path', lambda x: glob.glob(x))
427
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 32}))
428
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='mvit_base_32x3', skip_preprocess=True))
429
+ .map('features', 'features', ops.towhee.np_normalize())
430
+ .map('features', 'result', ops.ann_search.milvus_client(host='127.0.0.1', port='19530', collection_name='mvit_base', limit=10))
431
+ .map('result', 'predict', lambda x: [i[0] for i in x])
432
+ .map('path', 'ground_truth', ground_truth)
433
+ .window_all(('ground_truth', 'predict'), 'mhr', mean_hit_ratio)
434
+ .window_all(('ground_truth', 'predict'), 'map', mean_average_precision)
435
+ .output('mhr', 'map')
436
+ )
437
+
438
+ res = DataCollection(eval_pipe('./test/*/*.mp4'))
439
+ res.show()
440
+
441
+
442
+ # Switching to MVIT model increases the mHR to 0.79 and mAP to 0.83, which are much better than X3D model. However, both insert and search time have increased. It's time for you to make trade-off between latency and accuracy. You're always encouraged to play around with this tutorial.
443
+
444
+ # ## Release a Showcase
445
+ #
446
+ # We've learnt how to build a reverse video search engine. Now it's time to add some interface and release a showcase.
447
+
448
+ # In[19]:
449
+
450
+
451
+ import gradio
452
+
453
+ video_search_pipe = (
454
+ pipe.input('path')
455
+ .flat_map('path', 'path', lambda x: glob.glob(x))
456
+ .map('path', 'frames', ops.video_decode.ffmpeg(sample_type='uniform_temporal_subsample', args={'num_samples': 32}))
457
+ .map('frames', ('labels', 'scores', 'features'), ops.action_classification.pytorchvideo(model_name='mvit_base_32x3', skip_preprocess=True))
458
+ .map('features', 'features', ops.towhee.np_normalize())
459
+ .map('features', 'result', ops.ann_search.milvus_client(host='127.0.0.1', port='19530', collection_name='mvit_base', limit=3))
460
+ .map('result', 'predict', lambda x: [id_video[i[0]] for i in x])
461
+ .output('predict')
462
+ )
463
+
464
+
465
+ def video_search_function(video):
466
+ return video_search_pipe(video).to_list()[0][0]
467
+
468
+ interface = gradio.Interface(video_search_function,
469
+ inputs=gradio.Video(source='upload'),
470
+ outputs=[gradio.Video(format='mp4') for _ in range(3)]
471
+ )
472
+
473
+ interface.launch()
474
+
475
+
476
+ # In[ ]:
477
+
478
+
479
+
480
+
README.md CHANGED
@@ -1,12 +1,6 @@
1
  ---
2
- title: Reversevidosearchmilvus
3
- emoji: 📈
4
- colorFrom: green
5
- colorTo: green
6
  sdk: gradio
7
  sdk_version: 3.37.0
8
- app_file: app.py
9
- pinned: false
10
  ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: reversevidosearchmilvus
3
+ app_file: 1_reverse_video_search_engine.py
 
 
4
  sdk: gradio
5
  sdk_version: 3.37.0
 
 
6
  ---
 
 
requirements.txt ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aiofiles==23.1.0
2
+ aiohttp==3.8.4
3
+ aiosignal==1.3.1
4
+ altair==5.0.1
5
+ anyio==3.7.1
6
+ appnope==0.1.3
7
+ asttokens==2.2.1
8
+ async-timeout==4.0.2
9
+ attrs==23.1.0
10
+ av==10.0.0
11
+ backcall==0.2.0
12
+ bleach==6.0.0
13
+ certifi==2023.5.7
14
+ charset-normalizer==3.2.0
15
+ click==8.1.6
16
+ comm==0.1.3
17
+ contourpy==1.1.0
18
+ cycler==0.11.0
19
+ debugpy==1.6.7
20
+ decorator==5.1.1
21
+ docutils==0.20.1
22
+ environs==9.5.0
23
+ executing==1.2.0
24
+ fastapi==0.100.0
25
+ ffmpy==0.3.1
26
+ filelock==3.12.2
27
+ fonttools==4.41.0
28
+ frozenlist==1.4.0
29
+ fsspec==2023.6.0
30
+ fvcore==0.1.5.post20221221
31
+ gradio==3.37.0
32
+ gradio_client==0.2.10
33
+ grpcio==1.53.0
34
+ h11==0.14.0
35
+ httpcore==0.17.3
36
+ httpx==0.24.1
37
+ huggingface-hub==0.16.4
38
+ idna==3.4
39
+ importlib-metadata==6.8.0
40
+ iopath==0.1.10
41
+ ipykernel==6.24.0
42
+ ipython==8.14.0
43
+ jaraco.classes==3.3.0
44
+ jedi==0.18.2
45
+ Jinja2==3.1.2
46
+ jsonschema==4.18.4
47
+ jsonschema-specifications==2023.7.1
48
+ jupyter_client==8.3.0
49
+ jupyter_core==5.3.1
50
+ keyring==24.2.0
51
+ kiwisolver==1.4.4
52
+ linkify-it-py==2.0.2
53
+ markdown-it-py==2.2.0
54
+ MarkupSafe==2.1.3
55
+ marshmallow==3.19.0
56
+ matplotlib==3.7.2
57
+ matplotlib-inline==0.1.6
58
+ mdit-py-plugins==0.3.3
59
+ mdurl==0.1.2
60
+ milvus==2.2.11
61
+ more-itertools==9.1.0
62
+ mpmath==1.3.0
63
+ multidict==6.0.4
64
+ nest-asyncio==1.5.6
65
+ networkx==3.1
66
+ numpy==1.25.1
67
+ orjson==3.9.2
68
+ packaging==23.1
69
+ pandas==2.0.3
70
+ parameterized==0.9.0
71
+ parso==0.8.3
72
+ pexpect==4.8.0
73
+ pickleshare==0.7.5
74
+ Pillow==10.0.0
75
+ pkginfo==1.9.6
76
+ platformdirs==3.9.1
77
+ portalocker==2.7.0
78
+ prompt-toolkit==3.0.39
79
+ protobuf==4.23.4
80
+ psutil==5.9.5
81
+ ptyprocess==0.7.0
82
+ pure-eval==0.2.2
83
+ pydantic==1.10.11
84
+ pydub==0.25.1
85
+ Pygments==2.15.1
86
+ pymilvus==2.2.11
87
+ pyparsing==3.0.9
88
+ python-dateutil==2.8.2
89
+ python-dotenv==1.0.0
90
+ python-multipart==0.0.6
91
+ pytorchvideo==0.1.3
92
+ pytz==2023.3
93
+ PyYAML==6.0.1
94
+ pyzmq==25.1.0
95
+ readme-renderer==40.0
96
+ referencing==0.30.0
97
+ requests==2.31.0
98
+ requests-toolbelt==1.0.0
99
+ rfc3986==2.0.0
100
+ rich==13.4.2
101
+ rpds-py==0.9.2
102
+ semantic-version==2.10.0
103
+ six==1.16.0
104
+ sniffio==1.3.0
105
+ stack-data==0.6.2
106
+ starlette==0.27.0
107
+ sympy==1.12
108
+ tabulate==0.9.0
109
+ tenacity==8.2.2
110
+ termcolor==2.3.0
111
+ toolz==0.12.0
112
+ torch==2.0.1
113
+ torchvision==0.15.2
114
+ tornado==6.3.2
115
+ towhee==1.1.1
116
+ towhee.models==1.1.1
117
+ tqdm==4.65.0
118
+ traitlets==5.9.0
119
+ twine==4.0.2
120
+ typing_extensions==4.7.1
121
+ tzdata==2023.3
122
+ uc-micro-py==1.0.2
123
+ ujson==5.8.0
124
+ urllib3==2.0.3
125
+ uvicorn==0.23.1
126
+ wcwidth==0.2.6
127
+ webencodings==0.5.1
128
+ websockets==11.0.3
129
+ yacs==0.1.8
130
+ yarl==1.9.2
131
+ zipp==3.16.2
reverse_video_search.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c682a0eedd9719361e6cb5f6f661bd157b1034d49b3168e6e8986b37dc32350f
3
+ size 159397572