yaml config included
Browse files
README.md
CHANGED
@@ -25,8 +25,69 @@ This project demonstrates the fine-tuning of the **Mochi Text-to-Video** model u
|
|
25 |
|
26 |
- **Model Base**: [genmo/mochi-1-preview](https://huggingface.co/genmo/mochi-1-preview)
|
27 |
- **Fine-Tuning Dataset**: 23 short video clips of infinite zoom art style, and .txt descriptions
|
28 |
-
- **Training Settings**: 37 frames
|
29 |
- **Training Hardware**: H100 GPU
|
30 |
- **Training Duration**: 2h
|
31 |
|
32 |
-
<Gallery />
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
- **Model Base**: [genmo/mochi-1-preview](https://huggingface.co/genmo/mochi-1-preview)
|
27 |
- **Fine-Tuning Dataset**: 23 short video clips of infinite zoom art style, and .txt descriptions
|
|
|
28 |
- **Training Hardware**: H100 GPU
|
29 |
- **Training Duration**: 2h
|
30 |
|
31 |
+
<Gallery />
|
32 |
+
|
33 |
+
## lora.yaml:
|
34 |
+
```
|
35 |
+
init_checkpoint_path: /weights/dit.safetensors
|
36 |
+
checkpoint_dir: /finetunes/my_mochi_lora
|
37 |
+
train_data_dir: /videos_prepared
|
38 |
+
attention_mode: sdpa
|
39 |
+
single_video_mode: false # Useful for debugging whether your model can learn a single video
|
40 |
+
|
41 |
+
# You only need this if you're using wandb
|
42 |
+
wandb:
|
43 |
+
# project: mochi_1_lora
|
44 |
+
# name: ${checkpoint_dir}
|
45 |
+
# group: null
|
46 |
+
|
47 |
+
optimizer:
|
48 |
+
lr: 2e-4
|
49 |
+
weight_decay: 0.01
|
50 |
+
|
51 |
+
model:
|
52 |
+
type: lora
|
53 |
+
kwargs:
|
54 |
+
# Apply LoRA to the QKV projection and the output projection of the attention block.
|
55 |
+
qkv_proj_lora_rank: 16
|
56 |
+
qkv_proj_lora_alpha: 16
|
57 |
+
qkv_proj_lora_dropout: 0.
|
58 |
+
out_proj_lora_rank: 16
|
59 |
+
out_proj_lora_alpha: 16
|
60 |
+
out_proj_lora_dropout: 0.
|
61 |
+
|
62 |
+
training:
|
63 |
+
model_dtype: bf16
|
64 |
+
warmup_steps: 200
|
65 |
+
num_qkv_checkpoint: 48
|
66 |
+
num_ff_checkpoint: 48
|
67 |
+
num_post_attn_checkpoint: 48
|
68 |
+
num_steps: 2000
|
69 |
+
save_interval: 200
|
70 |
+
caption_dropout: 0.1
|
71 |
+
grad_clip: 0.0
|
72 |
+
save_safetensors: true
|
73 |
+
|
74 |
+
# Used for generating samples during training to monitor progress ...
|
75 |
+
sample:
|
76 |
+
interval: 200
|
77 |
+
output_dir: ${checkpoint_dir}/samples
|
78 |
+
decoder_path: /weights/decoder.safetensors
|
79 |
+
prompts:
|
80 |
+
- Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further.
|
81 |
+
- Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos.
|
82 |
+
- Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly.
|
83 |
+
- Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers.
|
84 |
+
- Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda.
|
85 |
+
seed: 12345
|
86 |
+
kwargs:
|
87 |
+
height: 480
|
88 |
+
width: 848
|
89 |
+
num_frames: 37
|
90 |
+
num_inference_steps: 64
|
91 |
+
sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)"
|
92 |
+
cfg_schedule_python_code: "[6.0] * 64"
|
93 |
+
```
|