Merge Allegro into diffusers
Browse files
README.md
CHANGED
@@ -78,38 +78,54 @@ pipeline_tag: text-to-video
|
|
78 |
|
79 |
# Quick start
|
80 |
|
81 |
-
1.
|
82 |
-
|
83 |
-
2. Install the necessary requirements.
|
84 |
|
85 |
-
- Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4.
|
86 |
|
87 |
-
- It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.
|
88 |
-
|
89 |
-
|
90 |
|
91 |
-
|
92 |
-
|
93 |
```python
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
```
|
105 |
-
|
106 |
-
Use
|
107 |
|
108 |
-
|
109 |
|
110 |
It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) to interpolate the video from 15 FPS to 30 FPS.
|
111 |
|
112 |
For better visual quality, please use imageio to save the video.
|
113 |
|
|
|
|
|
114 |
# License
|
115 |
This repo is released under the Apache 2.0 License.
|
|
|
78 |
|
79 |
# Quick start
|
80 |
|
81 |
+
1. Install the necessary requirements.
|
|
|
|
|
82 |
|
83 |
+
- Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4.
|
84 |
|
85 |
+
- It is recommended to use Anaconda to create a new environment (Python >= 3.10) `conda create -n rllegro python=3.10 -y` to run the following example.
|
86 |
+
|
87 |
+
- run `pip install git+https://github.com/huggingface/diffusers.git@9214f4a3782a74e510eff7e09b59457fe8b63511 torch==2.4.1 transformers==4.40.1 accelerate sentencepiece imageio imageio-ffmpeg beautifulsoup4`
|
88 |
|
89 |
+
2. Run inference.
|
|
|
90 |
```python
|
91 |
+
import torch
|
92 |
+
from diffusers import AutoencoderKLAllegro, AllegroPipeline
|
93 |
+
from diffusers.utils import export_to_video
|
94 |
+
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32)
|
95 |
+
pipe = AllegroPipeline.from_pretrained(
|
96 |
+
"rhymes-ai/Allegro", vae=vae, torch_dtype=torch.bfloat16
|
97 |
+
)
|
98 |
+
pipe.to("cuda")
|
99 |
+
pipe.vae.enable_tiling()
|
100 |
+
prompt = "A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats."
|
101 |
+
|
102 |
+
positive_prompt = """
|
103 |
+
(masterpiece), (best quality), (ultra-detailed), (unwatermarked),
|
104 |
+
{}
|
105 |
+
emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
|
106 |
+
sharp focus, high budget, cinemascope, moody, epic, gorgeous
|
107 |
+
"""
|
108 |
+
|
109 |
+
negative_prompt = """
|
110 |
+
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
|
111 |
+
low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
|
112 |
+
"""
|
113 |
+
|
114 |
+
prompt = prompt.format(prompt.lower().strip())
|
115 |
+
|
116 |
+
video = pipe(prompt, negative_prompt=negative_prompt, guidance_scale=7.5, max_sequence_length=512, num_inference_steps=100, generator = torch.Generator(device="cuda:0").manual_seed(42)).frames[0]
|
117 |
+
export_to_video(video, "output.mp4", fps=15)
|
118 |
```
|
119 |
+
|
120 |
+
Use `pipe.enable_sequential_cpu_offload()` to offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly.
|
121 |
|
122 |
+
3. (Optional) Interpolate the video to 30 FPS.
|
123 |
|
124 |
It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) to interpolate the video from 15 FPS to 30 FPS.
|
125 |
|
126 |
For better visual quality, please use imageio to save the video.
|
127 |
|
128 |
+
4. For faster inference such Context Parallel, PAB, please refer to our [github repo](https://github.com/rhymes-ai/Allegro).
|
129 |
+
|
130 |
# License
|
131 |
This repo is released under the Apache 2.0 License.
|