LarryTsai commited on
Commit
0a1b8a5
1 Parent(s): 4be8199

Merge Allegro into diffusers

Browse files
Files changed (1) hide show
  1. README.md +38 -22
README.md CHANGED
@@ -78,38 +78,54 @@ pipeline_tag: text-to-video
78
 
79
  # Quick start
80
 
81
- 1. Download the [Allegro GitHub code](https://github.com/rhymes-ai/Allegro).
82
-
83
- 2. Install the necessary requirements.
84
 
85
- - Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4. For details, see [requirements.txt](https://github.com/rhymes-ai/Allegro/blob/main/requirements.txt).
86
 
87
- - It is recommended to use Anaconda to create a new environment (Python >= 3.10) to run the following example.
88
-
89
- 3. Download the [Allegro model weights](https://huggingface.co/rhymes-ai/Allegro). Before diffuser integration, use git lfs or snapshot_download.
90
 
91
- 4. Run inference.
92
-
93
  ```python
94
- python single_inference.py \
95
- --user_prompt 'A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats.' \
96
- --save_path ./output_videos/test_video.mp4
97
- --vae your/path/to/vae \
98
- --dit your/path/to/transformer \
99
- --text_encoder your/path/to/text_encoder \
100
- --tokenizer your/path/to/tokenizer \
101
- --guidance_scale 7.5 \
102
- --num_sampling_steps 100 \
103
- --seed 42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ```
105
-
106
- Use '--enable_cpu_offload' to offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly.
107
 
108
- 5. (Optional) Interpolate the video to 30 FPS.
109
 
110
  It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) to interpolate the video from 15 FPS to 30 FPS.
111
 
112
  For better visual quality, please use imageio to save the video.
113
 
 
 
114
  # License
115
  This repo is released under the Apache 2.0 License.
 
78
 
79
  # Quick start
80
 
81
+ 1. Install the necessary requirements.
 
 
82
 
83
+ - Ensure Python >= 3.10, PyTorch >= 2.4, CUDA >= 12.4.
84
 
85
+ - It is recommended to use Anaconda to create a new environment (Python >= 3.10) `conda create -n rllegro python=3.10 -y` to run the following example.
86
+
87
+ - run `pip install git+https://github.com/huggingface/diffusers.git@9214f4a3782a74e510eff7e09b59457fe8b63511 torch==2.4.1 transformers==4.40.1 accelerate sentencepiece imageio imageio-ffmpeg beautifulsoup4`
88
 
89
+ 2. Run inference.
 
90
  ```python
91
+ import torch
92
+ from diffusers import AutoencoderKLAllegro, AllegroPipeline
93
+ from diffusers.utils import export_to_video
94
+ vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32)
95
+ pipe = AllegroPipeline.from_pretrained(
96
+ "rhymes-ai/Allegro", vae=vae, torch_dtype=torch.bfloat16
97
+ )
98
+ pipe.to("cuda")
99
+ pipe.vae.enable_tiling()
100
+ prompt = "A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this location might be a popular spot for docking fishing boats."
101
+
102
+ positive_prompt = """
103
+ (masterpiece), (best quality), (ultra-detailed), (unwatermarked),
104
+ {}
105
+ emotional, harmonious, vignette, 4k epic detailed, shot on kodak, 35mm photo,
106
+ sharp focus, high budget, cinemascope, moody, epic, gorgeous
107
+ """
108
+
109
+ negative_prompt = """
110
+ nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality,
111
+ low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.
112
+ """
113
+
114
+ prompt = prompt.format(prompt.lower().strip())
115
+
116
+ video = pipe(prompt, negative_prompt=negative_prompt, guidance_scale=7.5, max_sequence_length=512, num_inference_steps=100, generator = torch.Generator(device="cuda:0").manual_seed(42)).frames[0]
117
+ export_to_video(video, "output.mp4", fps=15)
118
  ```
119
+
120
+ Use `pipe.enable_sequential_cpu_offload()` to offload the model into CPU for less GPU memory cost (about 9.3G, compared to 27.5G if CPU offload is not enabled), but the inference time will increase significantly.
121
 
122
+ 3. (Optional) Interpolate the video to 30 FPS.
123
 
124
  It is recommended to use [EMA-VFI](https://github.com/MCG-NJU/EMA-VFI) to interpolate the video from 15 FPS to 30 FPS.
125
 
126
  For better visual quality, please use imageio to save the video.
127
 
128
+ 4. For faster inference such Context Parallel, PAB, please refer to our [github repo](https://github.com/rhymes-ai/Allegro).
129
+
130
  # License
131
  This repo is released under the Apache 2.0 License.