add README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,214 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CogVideoX-Fun-V1.1-Reward-LoRAs
|
2 |
+
## Introduction
|
3 |
+
We explore the Reward Backpropagation technique <sup>[1](#ref1) [2](#ref2)</sup> to optimized the generated videos by [CogVideoX-Fun-V1.1](https://github.com/aigc-apps/CogVideoX-Fun) for better alignment with human preferences.
|
4 |
+
We provide the following pre-trained models (i.e. LoRAs) along with [the training script](https://github.com/aigc-apps/CogVideoX-Fun/blob/main/scripts/train_reward_lora.py). You can use these LoRAs to enhance the corresponding base model as a plug-in or train your own reward LoRA.
|
5 |
+
|
6 |
+
For more details, please refer to our [GitHub repo](https://github.com/aigc-apps/CogVideoX-Fun).
|
7 |
+
|
8 |
+
| Name | Base Model | Reward Model | Hugging Face | Description |
|
9 |
+
|--|--|--|--|--|
|
10 |
+
| CogVideoX-Fun-V1.1-5b-InP-HPS2.1.safetensors | [CogVideoX-Fun-V1.1-5b](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [HPS v2.1](https://github.com/tgxs002/HPSv2) | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAs/resolve/main/CogVideoX-Fun-V1.1-5b-InP-HPS2.1.safetensors) | Official HPS v2.1 reward LoRA (`rank=128` and `network_alpha=64`) for CogVideoX-Fun-V1.1-5b-InP. It is trained with a batch size of 8 for 1,500 steps.|
|
11 |
+
| CogVideoX-Fun-V1.1-2b-InP-HPS2.1.safetensors | [CogVideoX-Fun-V1.1-2b](alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [HPS v2.1](https://github.com/tgxs002/HPSv2) | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAs/resolve/main/CogVideoX-Fun-V1.1-2b-InP-HPS2.1.safetensors) | Official HPS v2.1 reward LoRA (`rank=128` and `network_alpha=64`) for CogVideoX-Fun-V1.1-2b-InP. It is trained with a batch size of 8 for 3,000 steps.|
|
12 |
+
| CogVideoX-Fun-V1.1-5b-InP-MPS.safetensors | [CogVideoX-Fun-V1.1-5b](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [MPS](https://github.com/Kwai-Kolors/MPS) | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAs/resolve/main/CogVideoX-Fun-V1.1-5b-InP-MPS.safetensors) | Official MPS reward LoRA (`rank=128` and `network_alpha=64`) for CogVideoX-Fun-V1.1-5b-InP. It is trained with a batch size of 8 for 5,500 steps.|
|
13 |
+
| CogVideoX-Fun-V1.1-2b-InP-MPS.safetensors | [CogVideoX-Fun-V1.1-2b](alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [MPS](https://github.com/Kwai-Kolors/MPS) | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAs/resolve/main/CogVideoX-Fun-V1.1-2b-InP-MPS.safetensors) | Official MPS reward LoRA (`rank=128` and `network_alpha=64`) for CogVideoX-Fun-V1.1-2b-InP. It is trained with a batch size of 8 for 16,000 steps.|
|
14 |
+
|
15 |
+
## Demo
|
16 |
+
### CogVideoX-Fun-V1.1-5B
|
17 |
+
|
18 |
+
<table border="0" style="width: 100%; text-align: center; margin-top: 20px;">
|
19 |
+
<thead>
|
20 |
+
<tr>
|
21 |
+
<th style="text-align: center;" width="10%">Prompt</sup></th>
|
22 |
+
<th style="text-align: center;" width="30%">CogVideoX-Fun-V1.1-5B</th>
|
23 |
+
<th style="text-align: center;" width="30%">CogVideoX-Fun-V1.1-5B <br> HPSv2.1 Reward LoRA</th>
|
24 |
+
<th style="text-align: center;" width="30%">CogVideoX-Fun-V1.1-5B <br> MPS Reward LoRA</th>
|
25 |
+
</tr>
|
26 |
+
</thead>
|
27 |
+
<tr>
|
28 |
+
<td>
|
29 |
+
Pig with wings flying above a diamond mountain
|
30 |
+
</td>
|
31 |
+
<td>
|
32 |
+
<video src="https://github.com/user-attachments/assets/6682f507-4ca2-45e9-9d76-86e2d709efb3" width="100%" controls autoplay loop></video>
|
33 |
+
</td>
|
34 |
+
<td>
|
35 |
+
<video src="https://github.com/user-attachments/assets/ec9219a2-96b3-44dd-b918-8176b2beb3b0" width="100%" controls autoplay loop></video>
|
36 |
+
</td>
|
37 |
+
<td>
|
38 |
+
<video src="https://github.com/user-attachments/assets/a75c6a6a-0b69-4448-afc0-fda3c7955ba0" width="100%" controls autoplay loop></video>
|
39 |
+
</td>
|
40 |
+
</tr>
|
41 |
+
<tr>
|
42 |
+
<td>
|
43 |
+
A dog runs through a field while a cat climbs a tree
|
44 |
+
</td>
|
45 |
+
<td>
|
46 |
+
<video src="https://github.com/user-attachments/assets/0392d632-2ec3-46b4-8867-0da1db577b6d" width="100%" controls autoplay loop></video>
|
47 |
+
</td>
|
48 |
+
<td>
|
49 |
+
<video src="https://github.com/user-attachments/assets/7d8c729d-6afb-408e-b812-67c40c3aaa96" width="100%" controls autoplay loop></video>
|
50 |
+
</td>
|
51 |
+
<td>
|
52 |
+
<video src="https://github.com/user-attachments/assets/dcd1343c-7435-4558-b602-9c0fa08cbd59" width="100%" controls autoplay loop></video>
|
53 |
+
</td>
|
54 |
+
</tr>
|
55 |
+
<tr>
|
56 |
+
<td>
|
57 |
+
Crystal cake shimmering beside a metal apple
|
58 |
+
</td>
|
59 |
+
<td>
|
60 |
+
<video src="https://github.com/user-attachments/assets/af0df8e0-1edb-4e2c-9a87-70df2b564aef" width="100%" controls autoplay loop></video>
|
61 |
+
</td>
|
62 |
+
<td>
|
63 |
+
<video src="https://github.com/user-attachments/assets/59b840f7-d33c-4972-8024-11a097f1c419" width="100%" controls autoplay loop></video>
|
64 |
+
</td>
|
65 |
+
<td>
|
66 |
+
<video src="https://github.com/user-attachments/assets/4a1d0af0-54e3-455c-9930-0789e2346fa0" width="100%" controls autoplay loop></video>
|
67 |
+
</td>
|
68 |
+
</tr>
|
69 |
+
<tr>
|
70 |
+
<td>
|
71 |
+
Elderly artist with a white beard painting on a white canvas
|
72 |
+
</td>
|
73 |
+
<td>
|
74 |
+
<video src="https://github.com/user-attachments/assets/99e44f9d-c770-48ce-8cc5-69fe36d757bc" width="100%" controls autoplay loop></video>
|
75 |
+
</td>
|
76 |
+
<td>
|
77 |
+
<video src="https://github.com/user-attachments/assets/9c106677-e4cb-4970-a1a2-a013fa6ce903" width="100%" controls autoplay loop></video>
|
78 |
+
</td>
|
79 |
+
<td>
|
80 |
+
<video src="https://github.com/user-attachments/assets/0a7b57ab-36a8-4fb6-bcfa-75e3878c55b7" width="100%" controls autoplay loop></video>
|
81 |
+
</td>
|
82 |
+
</tr>
|
83 |
+
</table>
|
84 |
+
|
85 |
+
### CogVideoX-Fun-V1.1-2B
|
86 |
+
|
87 |
+
<table border="0" style="width: 100%; text-align: center; margin-top: 20px;">
|
88 |
+
<thead>
|
89 |
+
<tr>
|
90 |
+
<th style="text-align: center;" width="10%">Prompt</th>
|
91 |
+
<th style="text-align: center;" width="30%">CogVideoX-Fun-V1.1-2B</th>
|
92 |
+
<th style="text-align: center;" width="30%">CogVideoX-Fun-V1.1-2B <br> HPSv2.1 Reward LoRA</th>
|
93 |
+
<th style="text-align: center;" width="30%">CogVideoX-Fun-V1.1-2B <br> MPS Reward LoRA</th>
|
94 |
+
</tr>
|
95 |
+
</thead>
|
96 |
+
<tr>
|
97 |
+
<td>
|
98 |
+
A blue car drives past a white picket fence on a sunny day
|
99 |
+
</td>
|
100 |
+
<td>
|
101 |
+
<video src="https://github.com/user-attachments/assets/274b0873-4fbd-4afa-94c0-22b23168f0a1" width="100%" controls autoplay loop></video>
|
102 |
+
</td>
|
103 |
+
<td>
|
104 |
+
<video src="https://github.com/user-attachments/assets/730f2ba3-4c54-44ce-ad5b-4eeca7ae844e" width="100%" controls autoplay loop></video>
|
105 |
+
</td>
|
106 |
+
<td>
|
107 |
+
<video src="https://github.com/user-attachments/assets/1b8eb777-0f17-46ef-9e7e-c8be7636e157" width="100%" controls autoplay loop></video>
|
108 |
+
</td>
|
109 |
+
</tr>
|
110 |
+
<tr>
|
111 |
+
<td>
|
112 |
+
Blue jay swooping near a red maple tree
|
113 |
+
</td>
|
114 |
+
<td>
|
115 |
+
<video src="https://github.com/user-attachments/assets/a14778d2-38ea-42c3-89a2-18164c48f3cf" width="100%" controls autoplay loop></video>
|
116 |
+
</td>
|
117 |
+
<td>
|
118 |
+
<video src="https://github.com/user-attachments/assets/90af433f-ab01-4341-9977-c675041d76d0" width="100%" controls autoplay loop></video>
|
119 |
+
</td>
|
120 |
+
<td>
|
121 |
+
<video src="https://github.com/user-attachments/assets/dafe8bf6-77ac-4934-8c9c-61c25088f80b" width="100%" controls autoplay loop></video>
|
122 |
+
</td>
|
123 |
+
</tr>
|
124 |
+
<tr>
|
125 |
+
<td>
|
126 |
+
Yellow curtains swaying near a blue sofa
|
127 |
+
</td>
|
128 |
+
<td>
|
129 |
+
<video src="https://github.com/user-attachments/assets/e8a445a4-781b-4b3f-899b-2cc24201f247" width="100%" controls autoplay loop></video>
|
130 |
+
</td>
|
131 |
+
<td>
|
132 |
+
<video src="https://github.com/user-attachments/assets/318cfb00-8bd1-407f-aaee-8d4220573b82" width="100%" controls autoplay loop></video>
|
133 |
+
</td>
|
134 |
+
<td>
|
135 |
+
<video src="https://github.com/user-attachments/assets/6b90e8a4-1754-42f4-b454-73510ed0701d" width="100%" controls autoplay loop></video>
|
136 |
+
</td>
|
137 |
+
</tr>
|
138 |
+
<tr>
|
139 |
+
<td>
|
140 |
+
White tractor plowing near a green farmhouse
|
141 |
+
</td>
|
142 |
+
<td>
|
143 |
+
<video src="https://github.com/user-attachments/assets/42d35282-e964-4c8b-aae9-a1592178493a" width="100%" controls autoplay loop></video>
|
144 |
+
</td>
|
145 |
+
<td>
|
146 |
+
<video src="https://github.com/user-attachments/assets/c9704bd4-d88d-41a1-8e5b-b7980df57a4a" width="100%" controls autoplay loop></video>
|
147 |
+
</td>
|
148 |
+
<td>
|
149 |
+
<video src="https://github.com/user-attachments/assets/7a785b34-4a5d-4491-9e03-c40cf953a1dc" width="100%" controls autoplay loop></video>
|
150 |
+
</td>
|
151 |
+
</tr>
|
152 |
+
</table>
|
153 |
+
|
154 |
+
> [!NOTE]
|
155 |
+
> The above test prompts are from <a href="https://github.com/Vchitect/VBench/tree/master/prompts">VBench</a>. All videos are generated with lora weight 0.7.
|
156 |
+
|
157 |
+
## Quick Start
|
158 |
+
We provide a simple inference code to run CogVideoX-Fun-V1.1-5b-InP with its HPS2.1 reward LoRA.
|
159 |
+
|
160 |
+
```python
|
161 |
+
import torch
|
162 |
+
from diffusers import CogVideoXDDIMScheduler
|
163 |
+
|
164 |
+
from cogvideox.models.transformer3d import CogVideoXTransformer3DModel
|
165 |
+
from cogvideox.pipeline.pipeline_cogvideox_inpaint import CogVideoX_Fun_Pipeline_Inpaint
|
166 |
+
from cogvideox.utils.lora_utils import merge_lora
|
167 |
+
from cogvideox.utils.utils import get_image_to_video_latent, save_videos_grid
|
168 |
+
|
169 |
+
model_path = "alibaba-pai/CogVideoX-Fun-V1.1-5b-InP"
|
170 |
+
lora_path = "alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAs/CogVideoX-Fun-V1.1-5b-InP-HPS2.1.safetensors"
|
171 |
+
lora_weight = 0.7
|
172 |
+
|
173 |
+
prompt = "Pig with wings flying above a diamond mountain"
|
174 |
+
sample_size = [512, 512]
|
175 |
+
video_length = 49
|
176 |
+
|
177 |
+
transformer = CogVideoXTransformer3DModel.from_pretrained_2d(model_path, subfolder="transformer").to(torch.bfloat16)
|
178 |
+
scheduler = CogVideoXDDIMScheduler.from_pretrained(model_path, subfolder="scheduler")
|
179 |
+
pipeline = CogVideoX_Fun_Pipeline_Inpaint.from_pretrained(
|
180 |
+
model_path, transformer=transformer, scheduler=scheduler, torch_dtype=torch.bfloat16
|
181 |
+
)
|
182 |
+
pipeline.enable_model_cpu_offload()
|
183 |
+
pipeline = merge_lora(pipeline, lora_path, lora_weight)
|
184 |
+
|
185 |
+
generator = torch.Generator(device="cuda").manual_seed(42)
|
186 |
+
input_video, input_video_mask, _ = get_image_to_video_latent(None, None, video_length=video_length, sample_size=sample_size)
|
187 |
+
sample = pipeline(
|
188 |
+
prompt,
|
189 |
+
num_frames = video_length,
|
190 |
+
negative_prompt = "bad detailed",
|
191 |
+
height = sample_size[0],
|
192 |
+
width = sample_size[1],
|
193 |
+
generator = generator,
|
194 |
+
guidance_scale = 7.0,
|
195 |
+
num_inference_steps = 50,
|
196 |
+
video = input_video,
|
197 |
+
mask_video = input_video_mask,
|
198 |
+
).videos
|
199 |
+
|
200 |
+
save_videos_grid(sample, "samples/output.mp4", fps=8)
|
201 |
+
```
|
202 |
+
|
203 |
+
## Limitations
|
204 |
+
1. We observe after training to a certain extent, the reward continues to increase, but the quality of the generated videos does not further improve.
|
205 |
+
The model trickly learns some shortcuts (by adding artifacts in the background, i.e., adversarial patches) to increase the reward.
|
206 |
+
2. Currently, there is still a lack of suitable preference models for video generation. Directly using image preference models cannot
|
207 |
+
evaluate preferences along the temporal dimension (such as dynamism and consistency). Further more, We find using image preference models leads to a decrease
|
208 |
+
in the dynamism of generated videos. Although this can be mitigated by computing the reward using only the first frame of the decoded video, the impact still persists.
|
209 |
+
|
210 |
+
## Reference
|
211 |
+
<ol>
|
212 |
+
<li id="ref1">Clark, Kevin, et al. "Directly fine-tuning diffusion models on differentiable rewards.". In ICLR 2024.</li>
|
213 |
+
<li id="ref2">Prabhudesai, Mihir, et al. "Aligning text-to-image diffusion models with reward backpropagation." arXiv preprint arXiv:2310.03739 (2023).</li>
|
214 |
+
</ol>
|