aliencaocao commited on
Commit
1ba6919
1 Parent(s): cc2b01f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -14,12 +14,25 @@ SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in th
14
 
15
  Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
16
 
 
 
17
  ## Model description
18
 
19
  SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
20
 
21
  A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
22
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Intended uses & limitations
24
 
25
  You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the [model hub](https://huggingface.co/models?search=google/siglip) to look for
 
14
 
15
  Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
16
 
17
+ This model is a finetune for [DSTA BrainHack TIL 2024 competition](https://github.com/aliencaocao/TIL-2024).
18
+
19
  ## Model description
20
 
21
  SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
22
 
23
  A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
24
 
25
+ `*.pth` files are converted TensorRT checkpoints in FP16, to be used via torch2trt:
26
+
27
+ ```
28
+ from torch2trt import TRTModule()
29
+
30
+ vision_trt = TRTModule()
31
+ vision_trt.load_state_dict(torch.load('vision_trt.pth'))
32
+ text_trt = TRTModule()
33
+ text_trt.load_state_dict(torch.load('text_trt.pth'))
34
+ ```
35
+
36
  ## Intended uses & limitations
37
 
38
  You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the [model hub](https://huggingface.co/models?search=google/siglip) to look for