yuxi-liu-wired
commited on
Commit
•
386dad3
1
Parent(s):
9eedb6b
Update README.md
Browse files
README.md
CHANGED
@@ -31,6 +31,21 @@ Now, remove the projection matrix. This gives us $g: \text{Image} \to \R^{1024}$
|
|
31 |
|
32 |
The original paper actually stated that they trained *two* models, and one of them was based on ViT-B, but they did not release it.
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
Also, despite the names `style vector` and `content vector`, I have noticed by visual inspection that both are basically equally good for style embedding. I don't know why, but I guess that's life?
|
35 |
|
36 |
## How to use it
|
|
|
31 |
|
32 |
The original paper actually stated that they trained *two* models, and one of them was based on ViT-B, but they did not release it.
|
33 |
|
34 |
+
The model takes as input real-valued tensors. To preprocess images, use the CLIP preprocessor. That is, use `_, preprocess = clip.load("ViT-L/14")`. Explicitly, the preprocessor performs the following operation:
|
35 |
+
|
36 |
+
```python
|
37 |
+
def _transform(n_px):
|
38 |
+
return Compose([
|
39 |
+
Resize(n_px, interpolation=BICUBIC),
|
40 |
+
CenterCrop(n_px),
|
41 |
+
_convert_image_to_rgb,
|
42 |
+
ToTensor(),
|
43 |
+
Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
|
44 |
+
])
|
45 |
+
```
|
46 |
+
|
47 |
+
See the documentation for [`CLIPImageProcessor` for details](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPImageProcessor).
|
48 |
+
|
49 |
Also, despite the names `style vector` and `content vector`, I have noticed by visual inspection that both are basically equally good for style embedding. I don't know why, but I guess that's life?
|
50 |
|
51 |
## How to use it
|