flax-community
/

clip-rsicd-v2

Zero-Shot Image Classification Transformers PyTorch JAX clip vision Inference Endpoints

Model card Files Files and versions Community

sujitpal commited on Jul 25, 2021

Commit

d0f3b44

•

1 Parent(s): b60ae46

fixed typos and added info pointed out in evaluation

Browse files

Files changed (1) hide show

README.md +12 -6

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 ## Model Details
-This model is a fine-tuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
 ### Model Date
@@ -19,17 +19,17 @@ The base model uses a ViT-B/32 Transformer architecture as an image encoder and
 ### Model Version
-We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd) for zero-shot classification for each of those.
 ### Training
 To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
 The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
-Full log of the training run done to produce can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/2dj1exsw).
 ### Demo
-Checko out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
 ### Documents
@@ -67,7 +67,12 @@ for l, p in zip(labels, probs[0]):
 ### Intended Use
-The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
 #### Primary intended uses
@@ -79,7 +84,8 @@ We primarily imagine the model will be used by researchers to better understand
 ## Data
-The model was trained on publicly available remote sensing image cations datasets. Namely [RSICD](https://github.com/201528014227051/RSICD_optimal), [UCM](https://mega.nz/folder/wCpSzSoS#RXzIlrv--TDt3ENZdKN8JA) and [Sydney](https://mega.nz/folder/pG4yTYYA#4c4buNFLibryZnlujsrwEQ).
 ## Performance and Limitations

 ## Model Details
+This model is a fine-tuned [CLIP by OpenAI](https://huggingface.co/openai/clip-vit-base-patch32). It is designed with an aim to improve zero-shot image classification, text-to-image and image-to-image retrieval specifically on remote sensing images.
 ### Model Date
 ### Model Version
+We release several checkpoints for `clip-rsicd` model. Refer to [our github repo](https://github.com/arampacha/CLIP-rsicd#evaluation-results) for performance metrics on zero-shot classification for each of those.
 ### Training
 To reproduce the fine-tuning procedure one can use released [script](https://github.com/arampacha/CLIP-rsicd/blob/master/run_clip_flax_tv.py).
 The model was trained using batch size 1024, adafactor optimizer with linear warmup and decay with peak learning rate 1e-4 on 1 TPU-v3-8.
+Full log of the training run can be found on [WandB](https://wandb.ai/wandb/hf-flax-clip-rsicd/runs/2dj1exsw).
 ### Demo
+Check out the model text-to-image and image-to-image capabilities using [this demo](https://huggingface.co/spaces/sujitpal/clip-rsicd-demo).
 ### Documents
 ### Intended Use
+The model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot, arbitrary image classification.
+In addition, we can imagine applications in defense and law enforcement, climate change and global warming, and even some consumer applications. A partial list of applications can be found [here](https://github.com/arampacha/CLIP-rsicd#applications). In general we think such models can be useful as digital assistants for humans engaged in searching through large collections of images.
+We also hope it can be used for interdisciplinary studies of the potential impact of such models - the CLIP paper includes a discussion of potential downstream impacts to provide an example for this sort of analysis.
 #### Primary intended uses
 ## Data
+The model was trained on publicly available remote sensing image captions datasets. Namely [RSICD](https://github.com/201528014227051/RSICD_optimal), [UCM](https://mega.nz/folder/wCpSzSoS#RXzIlrv--TDt3ENZdKN8JA) and [Sydney](https://mega.nz/folder/pG4yTYYA#4c4buNFLibryZnlujsrwEQ). More information on the datasets used can be found on [our project page](https://github.com/arampacha/CLIP-rsicd#dataset).
 ## Performance and Limitations