yuxi-liu-wired commited on
Commit
386dad3
1 Parent(s): 9eedb6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -31,6 +31,21 @@ Now, remove the projection matrix. This gives us $g: \text{Image} \to \R^{1024}$
31
 
32
  The original paper actually stated that they trained *two* models, and one of them was based on ViT-B, but they did not release it.
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  Also, despite the names `style vector` and `content vector`, I have noticed by visual inspection that both are basically equally good for style embedding. I don't know why, but I guess that's life?
35
 
36
  ## How to use it
 
31
 
32
  The original paper actually stated that they trained *two* models, and one of them was based on ViT-B, but they did not release it.
33
 
34
+ The model takes as input real-valued tensors. To preprocess images, use the CLIP preprocessor. That is, use `_, preprocess = clip.load("ViT-L/14")`. Explicitly, the preprocessor performs the following operation:
35
+
36
+ ```python
37
+ def _transform(n_px):
38
+ return Compose([
39
+ Resize(n_px, interpolation=BICUBIC),
40
+ CenterCrop(n_px),
41
+ _convert_image_to_rgb,
42
+ ToTensor(),
43
+ Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
44
+ ])
45
+ ```
46
+
47
+ See the documentation for [`CLIPImageProcessor` for details](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPImageProcessor).
48
+
49
  Also, despite the names `style vector` and `content vector`, I have noticed by visual inspection that both are basically equally good for style embedding. I don't know why, but I guess that's life?
50
 
51
  ## How to use it