Zero-Shot Image Classification
Safetensors
clip
zer0int commited on
Commit
ffbd707
โ€ข
1 Parent(s): 0fa1bb0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - zer0int/CLIP-adversarial-typographic-attack_text-image
4
+ - SPRIGHT-T2I/spright_coco
5
+ base_model:
6
+ - BeichenZhang/LongCLIP-L
7
+ pipeline_tag: zero-shot-image-classification
8
+ ---
9
+ ### Long-CLIP ViT-L/14 finetune: SAE-informed adversarial training
10
+
11
+ - SAE = Sparse autoencoder. All training info & code: [github.com/zer0int/CLIP-SAE-finetune](https://github.com/zer0int/CLIP-SAE-finetune)
12
+ - This Long-CLIP, ๐Ÿ‘‰ [direct download Text Encoder](https://huggingface.co/zer0int/LongCLIP-SAE-ViT-L-14/resolve/main/Long-ViT-L-14-GmP-SAE-TE-only.safetensors?download=true) ๐Ÿ‘ˆ is also the best Long-CLIP to use with [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo).
13
+ - Required: Use with my [zer0int/ComfyUI-HunyuanVideo-Nyan](https://github.com/zer0int/ComfyUI-HunyuanVideo-Nyan) node (changes influence of LLM vs. CLIP; otherwise, difference is very little).
14
+ - โ˜• [Buy me a coffee](https://ko-fi.com/zer0int)
15
+
16
+
17
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/HeMdxok8uKVA87BJqHpS9.png)
18
+
19
+ The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the [original Long-CLIP paper](https://arxiv.org/abs/2403.15378) for details.
20
+
21
+ HunyuanVideo demo:
22
+
23
+ 69 tokens, normal scene:
24
+ Lens: 16mm. Aperture: f/2.8. Color Grading: Blue-green monochrome. Lighting: Low-key with backlit silhouettes. Background: Gothic cathedral at night, stained glass windows breaking. Camera angle: Over the shoulder of a ninja, tracking her mid-air leap as she lands on a rooftop.
25
+
26
+ 52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
27
+ In this surreal nightmare documentary, a sizable spider with a human face is peacefully savoring her breakfast at a diner. The spider has a spider body, but a lady's face on the front, and regular human hands at the end of the spider legs.
28
+
29
+
30
+ <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/J1_xaDybbnF9UCBGxuKAc.mp4"></video>
31
+
32
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/awPdlSxGFOrs_kanLbaW_.png)
33
+
34
+
35
+