sujitpal commited on
Commit
d0f3b44
1 Parent(s): b60ae46

fixed typos and added info pointed out in evaluation

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
 
8
  ## Model Details
9
 
10
- This model is a fine-tuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
11
 
12
  ### Model Date
13
 
@@ -19,17 +19,17 @@ The base model uses a ViT-B/32 Transformer architecture as an image encoder and
19
 
20
  ### Model Version
21
 
22
- We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd) for zero-shot classification for each of those.
23
 
24
  ### Training
25
 
26
  To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
27
  The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
28
- Full log of the training run done to produce can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/2dj1exsw).
29
 
30
  ### Demo
31
 
32
- Checko out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
33
 
34
 
35
  ### Documents
@@ -67,7 +67,12 @@ for l, p in zip(labels, probs[0]):
67
 
68
  ### Intended Use
69
 
70
- The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
 
 
 
 
 
71
 
72
  #### Primary intended uses
73
 
@@ -79,7 +84,8 @@ We primarily imagine the model will be used by researchers to better understand
79
 
80
  ## Data
81
 
82
- The model was trained on publicly available remote sensing image cations datasets. Namely [RSICD](https://github.com/201528014227051/RSICD_optimal), [UCM](https://mega.nz/folder/wCpSzSoS#RXzIlrv--TDt3ENZdKN8JA) and [Sydney](https://mega.nz/folder/pG4yTYYA#4c4buNFLibryZnlujsrwEQ).
 
83
 
84
 
85
  ## Performance and Limitations
7
 
8
  ## Model Details
9
 
10
+ This model is a fine-tuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with an aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
11
 
12
  ### Model Date
13
 
19
 
20
  ### Model Version
21
 
22
+ We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd#evaluation-results) for performance metrics on zero-shot classification for each of those.
23
 
24
  ### Training
25
 
26
  To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
27
  The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
28
+ Full log of the training run can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/2dj1exsw).
29
 
30
  ### Demo
31
 
32
+ Check out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
33
 
34
 
35
  ### Documents
67
 
68
  ### Intended Use
69
 
70
+ The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification.
71
+
72
+ In addition, we can imagine applications in defense and law enforcement, climate change and global warming, and even some consumer applications. A partial list of applications can be found [here](https://github.com/arampacha/CLIP-rsicd#applications). In general we think such models can be useful as digital assistants for humans engaged in searching through large collections of images.
73
+
74
+ We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
75
+
76
 
77
  #### Primary intended uses
78
 
84
 
85
  ## Data
86
 
87
+ The model was trained on publicly available remote sensing image captions datasets. Namely [RSICD](https://github.com/201528014227051/RSICD_optimal), [UCM](https://mega.nz/folder/wCpSzSoS#RXzIlrv--TDt3ENZdKN8JA) and [Sydney](https://mega.nz/folder/pG4yTYYA#4c4buNFLibryZnlujsrwEQ). More information on the datasets used can be found on [our project page](https://github.com/arampacha/CLIP-rsicd#dataset).
88
+
89
 
90
 
91
  ## Performance and Limitations