Text-to-Image
English
xujz0703 commited on
Commit
e102d1c
1 Parent(s): c63dec8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ImageReward
2
+
3
+ ImageReward is the first general-purpose text-to-image human preference RM which is trained on in total 137k pairs of expert comparisons, based on text prompts and corresponding model outputs from DiffusionDB. We demonstrate that ImageReward outperforms existing text-image scoring methods, such as CLIP, Aesthetic, and BLIP, in terms of understanding human preference in text-to-image synthesis through extensive analysis and experiments.
4
+
5
+ ## Approach
6
+
7
+ ![ImageReward](ImageReward.png)
8
+
9
+ ## Setup
10
+
11
+ * Environment: install dependencies via `pip install -r requirements.txt`.
12
+
13
+ ## Usage
14
+
15
+ ```python
16
+ import os
17
+ import torch
18
+ import ImageReward as reward
19
+
20
+ if __name__ == "__main__":
21
+ prompt = "a painting of an ocean with clouds and birds, day time, low depth field effect"
22
+ img_prefix = "assets/images"
23
+ generations = [f"{pic_id}.webp" for pic_id in range(1, 5)]
24
+ img_list = [os.path.join(img_prefix, img) for img in generations]
25
+ model = reward.load()
26
+ with torch.no_grad():
27
+ ranking, rewards = model.inference_rank(prompt, img_list)
28
+ # Print the result
29
+ print("\nPreference predictions:\n")
30
+ print(f"ranking = {ranking}")
31
+ print(f"rewards = {rewards}")
32
+ for index in range(len(img_list)):
33
+ score = model.score(prompt, img_list[index])
34
+ print(f"{generations[index]:>16s}: {score:.2f}")
35
+
36
+ ```
37
+
38
+ The output will look like the following (the exact numbers may be slightly different depending on the compute device):
39
+
40
+ ```
41
+ Preference predictions:
42
+
43
+ ranking = [1, 2, 3, 4]
44
+ rewards = [[0.5811622738838196], [0.2745276093482971], [-1.4131819009780884], [-2.029569625854492]]
45
+ 1.webp: 0.58
46
+ 2.webp: 0.27
47
+ 3.webp: -1.41
48
+ 4.webp: -2.03
49
+ ```
50
+
51
+ ## Test
52
+
53
+ ### Setup for baselines
54
+
55
+ #### Environment
56
+
57
+ ```bash
58
+ $ pip install git+https://github.com/openai/CLIP.git
59
+ ```
60
+
61
+ #### Checkpoint
62
+
63
+ Models | File Paths | Download Links
64
+ --- | :---: | :---:
65
+ ImageReward | checkpoint/ | <a href="https://huggingface.co/THUDM/ImageReward/blob/main/ImageReward.pt">Download</a>
66
+ CLIP Score | checkpoint/clip/ | <a href="https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt">Download</a>
67
+ BLIP Score | checkpoint/blip/ | <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large.pth">Download</a>
68
+ Aesthetic | checkpoint/aesthetic/ | <a href="https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac%2Blogos%2Bava1-l14-linearMSE.pth">Download</a>
69
+
70
+ #### Data
71
+
72
+ Data | File Paths | Download Links
73
+ --- | :---: | :---:
74
+ test_images | data/ | <a href="https://huggingface.co/THUDM/ImageReward/blob/main/test_images.zip">Download</a>
75
+
76
+ Download `test_images.zip` and unzip it to `data/test_images/`
77
+
78
+ ### One step for test
79
+
80
+ ```bash
81
+ $ python test.py
82
+ ```
83
+
84
+ The test result is:
85
+
86
+ Models | Preference Acc.
87
+ --- | :---:
88
+ CLIP Score | 54.82
89
+ Aesthetic Score | 57.35
90
+ BLIP Score | 57.76
91
+ ImageReward (Ours) | **65.14**