nielsr HF Staff commited on
Commit
b09af1b
·
verified ·
1 Parent(s): db34893

Add comprehensive model card for Many-for-Many unified generation model

Browse files

This PR adds a comprehensive model card for the Many-for-Many model.

It links the model to its paper: [Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks](https://huggingface.co/papers/2506.01758).

It also adds essential metadata, including:
* `pipeline_tag: any-to-any`, reflecting its capability across various image and video generation and manipulation tasks.
* `library_name: diffusers`, as the model is built upon the Diffusers framework.
* `license: apache-2.0`.

Additionally, the PR provides links to the project page and the GitHub repository, along with a basic Python usage example to help users get started.

Files changed (1) hide show
  1. README.md +111 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: any-to-any
3
+ library_name: diffusers
4
+ license: apache-2.0
5
+ ---
6
+
7
+ # Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
8
+
9
+ <div align="center">
10
+ <img src="https://huggingface.co/LetsThink/MfM-Pipeline-8B/resolve/main/assets/MfM_logo.jpeg" alt="MfM-logo" width="50%">
11
+ </div>
12
+
13
+ **Many-for-Many (MfM)** is a unified framework introduced in the paper [Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks](https://huggingface.co/papers/2506.01758). This framework leverages available training data from many different visual generation and manipulation tasks to train a single model for those tasks.
14
+
15
+ MfM utilizes a lightweight adapter to unify diverse conditions across different tasks and employs a joint image-video learning strategy for progressive training from scratch. This approach leads to a unified visual generation and manipulation model with improved video generation performance. The model also integrates depth maps as a condition to enhance its perception of 3D space in visual generation.
16
+
17
+ Two versions of the model (8B and 2B parameters) are available, each capable of performing more than 10 different tasks, including text-to-video (T2V), image-to-video (I2V), video-to-video (V2V), and various image and video manipulation tasks. The 8B model demonstrates highly competitive performance in video generation.
18
+
19
+ * **Paper:** [Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks](https://huggingface.co/papers/2506.01758)
20
+ * **Project Page:** [https://leeruibin.github.io/MfMPage/](https://leeruibin.github.io/MfMPage/)
21
+ * **Code:** [https://github.com/SandAI-org/MAGI-1](https://github.com/SandAI-org/MAGI-1)
22
+
23
+ ## Visual Results
24
+
25
+ <img src='https://huggingface.co/LetsThink/MfM-Pipeline-8B/resolve/main/assets/visual_result.png'>
26
+
27
+ ## Demo Video
28
+
29
+ <div align="center">
30
+ <video src="https://github.com/user-attachments/assets/f1ddd1fd-1c2b-44e7-94dc-9f62963ab147" width="70%" controls> </video>
31
+ </div>
32
+
33
+ ## Architecture
34
+
35
+ <img src='https://huggingface.co/LetsThink/MfM-Pipeline-8B/resolve/main/assets/arch.png'>
36
+
37
+ ## Usage
38
+
39
+ You can load the model using the `diffusers` library and perform various generation tasks.
40
+
41
+ First, ensure you have the necessary requirements installed:
42
+
43
+ ```bash
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ Then, you can download the pipeline from Hugging Face Hub and use it for inference:
48
+
49
+ ```python
50
+ from huggingface_hub import snapshot_download
51
+ from diffusers import DiffusionPipeline
52
+ import torch
53
+ import os
54
+
55
+ # Define a local directory to download the model
56
+ local_dir = "./MfM-Pipeline-8B"
57
+
58
+ # Download the pipeline from Hugging Face Hub
59
+ # You can use "LetsThink/MfM-Pipeline-2B" for the 2B version
60
+ snapshot_download(repo_id="LetsThink/MfM-Pipeline-8B", local_dir=local_dir)
61
+
62
+ # Load the pipeline. Since MfMPipeline is a custom class, we need trust_remote_code=True.
63
+ pipe = DiffusionPipeline.from_pretrained(local_dir, torch_dtype=torch.float16, trust_remote_code=True)
64
+ pipe.to("cuda") # or your preferred device like "cpu"
65
+
66
+ # Example: Text-to-Video generation (task="t2v")
67
+ prompt = "A majestic eagle flying over snow-capped mountains."
68
+ output_dir = "outputs"
69
+ task = "t2v" # The model supports multiple tasks like "t2v", "i2v", "i2i", etc.
70
+
71
+ # Create output directory if it doesn't exist
72
+ os.makedirs(output_dir, exist_ok=True)
73
+
74
+ # Run inference
75
+ # Parameters like num_frames, num_inference_steps, guidance_scale, motion_score
76
+ # are crucial and may vary per task. Refer to the official GitHub repository
77
+ # for recommended values and detailed usage for different tasks.
78
+ video_frames = pipe(
79
+ prompt=prompt,
80
+ task=task,
81
+ crop_type="keep_res",
82
+ num_inference_steps=30,
83
+ guidance_scale=9,
84
+ motion_score=5,
85
+ num_samples=1,
86
+ upscale=4,
87
+ noise_aug_strength=0.0,
88
+ # t2v_inputs expects a path to a file with prompts, here we pass prompt directly.
89
+ # For full functionality as in infer_mfm_pipeline.py, you might need to adapt.
90
+ ).images[0] # The pipeline returns a list of generated results, take the first one
91
+
92
+ # You can save the video frames as a GIF or MP4 using libraries like imageio or moviepy
93
+ # Example using imageio (install with: pip install imageio imageio-ffmpeg)
94
+ # import imageio
95
+ # output_video_path = os.path.join(output_dir, "generated_video.mp4")
96
+ # imageio.mimsave(output_video_path, video_frames, fps=8)
97
+ # print(f"Generated video saved to {output_video_path}")
98
+ ```
99
+
100
+ ## Citation
101
+
102
+ If you find our code or model useful in your research, please cite:
103
+
104
+ ```bibtex
105
+ @article{yang2025MfM,
106
+ title={Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks},
107
+ author={Tao Yang, Ruibin Li, Yangming Shi, Yuqi Zhang, Qide Dong, Haoran Cheng, Weiguo Feng, Shilei Wen, Bingyue Peng, Lei Zhang},
108
+ year={2025},
109
+ booktitle={arXiv preprint arXiv:2506.01758},
110
+ }
111
+ ```