SajjadAyoubi commited on
Commit
199a6ad
1 Parent(s): c18ee2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -19,8 +19,10 @@ tokenizer = AutoTokenizer.from_pretrained('SajjadAyoubi/clip-fa-text')
19
  text = 'something'
20
  image = PIL.Image.open('my_favorite_image.jpg')
21
  # compute embeddings
22
- text_embedding = text_encoder(**tokenizer(text, return_tensors='pt')).pooler_output
23
- image_embedding = vision_encoder(**preprocessor(image, return_tensors='pt')).pooler_output
 
 
24
  text_embedding.shape == image_embedding.shape
25
  ```
26
 
@@ -30,7 +32,7 @@ The followings are just some use cases of CLIPfa on 25K [`Unsplash images`](http
30
  ```python
31
  from clipfa import CLIPDemo
32
  demo = CLIPDemo(vision_encoder, text_encoder, tokenizer)
33
- demo.compute_text_embeddings(['سیب','موز' ,'آلبالو'])
34
  demo.compute_image_embeddings(test_df.image_path.to_list())
35
  ```
36
  ### Image Search:
@@ -74,7 +76,7 @@ We used a small set of images (25K) to keep this app almost real-time, but it's
74
  ## Dataset: 400K
75
  We started with this question that how much the original Clip model depends on its big training dataset containing a lot of conceptual samples. Our model shows that It is possible to meet an acceptable enough target with only a little amount of data even though, It may not have known enough concepts and subjects to be used widely. Our model trained on a dataset gathered from different resources such as The Flickr30k, MS-COCO 2017, Google CCm3, ... . We used these datasets and translated them into the Persian language with a [`tool`](https://github.com/sajjjadayobi/CLIPfa/blob/main/clipfa/data/translation.py) prepared by ourselves. Using the Google Translate and Multilingual Similarity Check method we provided an automatic translator that has been given a list of English captions and filtered by the best translations.
76
 
77
- - Note: We used [`image2ds`](https://github.com/rom1504/img2dataset) a great tool to download large scale image datasets such as MS-COCO. It can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets.
78
  - [`coco-flickr-fa 130K on Kaggle`](https://www.kaggle.com/navidkanaani/coco-flickr-farsi)
79
 
80
 
19
  text = 'something'
20
  image = PIL.Image.open('my_favorite_image.jpg')
21
  # compute embeddings
22
+ text_embedding = text_encoder(**tokenizer(text,
23
+ return_tensors='pt')).pooler_output
24
+ image_embedding = vision_encoder(**preprocessor(image,
25
+ return_tensors='pt')).pooler_output
26
  text_embedding.shape == image_embedding.shape
27
  ```
28
 
32
  ```python
33
  from clipfa import CLIPDemo
34
  demo = CLIPDemo(vision_encoder, text_encoder, tokenizer)
35
+ demo.compute_text_embeddings(['گاو' ,'اسب' ,'ماهی'])
36
  demo.compute_image_embeddings(test_df.image_path.to_list())
37
  ```
38
  ### Image Search:
76
  ## Dataset: 400K
77
  We started with this question that how much the original Clip model depends on its big training dataset containing a lot of conceptual samples. Our model shows that It is possible to meet an acceptable enough target with only a little amount of data even though, It may not have known enough concepts and subjects to be used widely. Our model trained on a dataset gathered from different resources such as The Flickr30k, MS-COCO 2017, Google CCm3, ... . We used these datasets and translated them into the Persian language with a [`tool`](https://github.com/sajjjadayobi/CLIPfa/blob/main/clipfa/data/translation.py) prepared by ourselves. Using the Google Translate and Multilingual Similarity Check method we provided an automatic translator that has been given a list of English captions and filtered by the best translations.
78
 
79
+ - Note: We used [`image2ds`](https://github.com/rom1504/img2dataset) a great tool to download large scale image datasets such as MS-COCO. It can download, resize and package 100M URLs in 20h on one machine. Also supports saving captions for url+caption datasets.
80
  - [`coco-flickr-fa 130K on Kaggle`](https://www.kaggle.com/navidkanaani/coco-flickr-farsi)
81
 
82