CocoBro commited on
Commit
95ec4b7
Β·
verified Β·
1 Parent(s): 9fd03c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # MMEDIT
5
+
6
+ [![arXiv](https://img.shields.io/badge/arXiv-25xx.xxxxx-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/25xx.xxxxx)
7
+ [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/CocoBro/MMEdit)
8
+ [![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](./LICENSE)
9
+
10
+
11
+ ## Introduction
12
+ 🟣 **MMEDIT** is a state-of-the-art audio generation model built upon the powerful [Qwen2-Audio 7B](https://huggingface.co/Qwen/Qwen2-Audio-7B). It leverages the robust audio understanding and instruction-following capabilities of the large language model to achieve precise and high-fidelity audio editing.
13
+
14
+ ---
15
+ ## Model Download
16
+ | Models | πŸ€— Hugging Face |
17
+ |-------|-------|
18
+ | MMEdit| [MMEdit](https://huggingface.co/CocoBro/MMEdit) |
19
+
20
+ download our pretrained model into ./ckpt/mmedit/
21
+
22
+ ---
23
+
24
+ ## Model Usage
25
+ ### πŸ”§ Dependencies and Installation
26
+ - Python >= 3.10
27
+ - [PyTorch >= 2.5.0](https://pytorch.org/)
28
+ - [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)
29
+ - Dependent models:
30
+ - [Qwen/Qwen2-Audio-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct), download into `./ckpt/qwen2-audio-7B-Instruct/`
31
+
32
+ ```bash
33
+ # 1. Clone the repository
34
+ git clone https://github.com/xycs6k8r2Anonymous/MMEdit.git
35
+ cd MMEDIT
36
+
37
+ # 2. Create environment
38
+ conda create -n mmedit python=3.10 -y
39
+ conda activate mmedit
40
+
41
+ # 3. Install PyTorch and dependencies
42
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
43
+ pip install -r requirements.txt
44
+
45
+ # Download Qwen2-Audio-7B-Instruct
46
+ huggingface-cli download Qwen/Qwen2-Audio-7B-Instruct --local-dir ./ckpt/qwen2-audio-7B-instruct
47
+
48
+ # Download MMEdit (Our Model)
49
+ huggingface-cli download CocoBro/MMEdit --local-dir ./ckpt/mmedit
50
+ ```
51
+
52
+ ## πŸ“‚ Data Preparation
53
+
54
+ For detailed instructions on the data pipeline, and dataset structure used for training, please refer to our separate documentation:
55
+
56
+ πŸ‘‰ **[Data Pipeline & Preparation Guide](./datapipeline/datapipeline.md)**
57
+
58
+
59
+ ## ⚑ Quick Start
60
+
61
+
62
+
63
+
64
+ ### 1. Inference
65
+ You can quickly generate example audio with the following code:
66
+
67
+ ```
68
+ bash bash_scripts/infer_single.sh
69
+ ```
70
+
71
+ The output will be save at inference/example
72
+
73
+
74
+ ---
75
+
76
+ ## πŸš€ Usage
77
+
78
+ ### 1. Configuration
79
+ Before running inference or training, please check `configs/config.yaml`. The project uses `hydra` for configuration management, allowing easy overrides via command line.
80
+
81
+ ### 2. Inference
82
+ To run batch inference using the provided scripts:
83
+
84
+ ```bash
85
+ cd src
86
+ bash bash_scripts/inference.sh
87
+ ```
88
+
89
+ ### 3. Training
90
+ Ensure you have downloaded the **Qwen2-Audio-7B-Instruct** checkpoint to `./ckpt/qwen2-audio-7B-instruct` and prepared your data according to the [Data Pipeline Guide](./docs/DATA_PIPELINE.md).
91
+
92
+ ```bash
93
+ cd src
94
+ # Launch distributed training
95
+ bash bash_scripts/train_dist.sh
96
+ ```
97
+
98
+ ---
99
+
100
+ ## πŸ“ Todo
101
+ - [ ] Release inference code and checkpoints.
102
+ - [ ] Release training scripts.
103
+ - [ ] Add HuggingFace Gradio Demo.
104
+ - [ ] Release evaluation metrics and post-processing tools.
105
+
106
+ ## 🀝 Acknowledgement
107
+ We thank the following open-source projects for their inspiration and code:
108
+ * [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio)
109
+ * [Uniflowaudio](https://github.com/wsntxxn/UniFlow-Audio)
110
+ * [AudioTime](https://github.com/wsntxxn/UniFlow-Audio)
111
+
112
+
113
+ ## πŸ–ŠοΈ Citation
114
+ If you find this project useful, please cite our paper:
115
+
116
+ ```bibtex
117
+ @article{mmedit2024,
118
+ title={MMEDIT: Audio Generation based on Qwen2-Audio 7B},
119
+ author={Your Name and Collaborators},
120
+ journal={arXiv preprint arXiv:25xx.xxxxx},
121
+ year={2024}
122
+ }
123
+ ```