Files changed (1) hide show
  1. README.md +174 -0
README.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - music
8
+ - art
9
+ ---
10
+
11
+ <div align="center">
12
+ <img src="Yi_logo.svg" width="150px" style="display: inline-block;">
13
+ <img src="m-a-p.png" width="150px" style="display: inline-block;">
14
+ </div>
15
+
16
+ ## MuPT: Symbolic Music Generative Pre-trained Transformer
17
+
18
+ MuPT is a series of pre-trained models for symbolic music generation. It was trained on a large-scale dataset of symbolic music, including millions of monophonic and polyphonic pieces from different genres and styles. The models are trained with the LLama2 architecture, and can be further used for downstream music generation tasks such as melody generation, accompaniment generation, and multi-track music generation.
19
+
20
+ - 29/01/2024: intermediate checkpoints of MuPT-v0-8192-1.3B model are released.
21
+ - 09/01/2024: a series of pre-trained MuPT models are released, with parameters ranging from 110M to 1.3B.
22
+
23
+ ## Intermediate Checkpoints
24
+
25
+ We uploaded all the intermediate checkpoints of MuPT-v0-8192-1.3B model, which can be used for further research, continue training, and downstream tasks, etc. Available intermediate checkpoints are up to 23000 steps, with checkpoints every 1000 steps.
26
+
27
+ Training parameters:
28
+ | Name | Parameters | Batch Size | Tokens/Step | Max Learnging Rate |Seq Length | Hidden Size | Layers | Heads |
29
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: |:---: | :---: |
30
+ | MuPT-v0-8192-1.3B | 1.3B | 1024 | 8.4M | 3e-5 | 8192 | 1536 | 48 | 24 |
31
+
32
+ ## Model architecture
33
+
34
+ The details of model architecture of MuPT-v0 are listed below:
35
+
36
+ | Name | Parameters | Training Data(Music Pieces) | Seq Length | Hidden Size | Layers | Heads |
37
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
38
+ | MuPT-v0-8192-110M | 110M | 7M x 10 epochs | 8192 | 768 | 12 | 12 |
39
+ | MuPT-v0-8192-345M | 345M | 7M x 7.0 epochs | 8192 | 1024 | 24 | 16 |
40
+ | MuPT-v0-8192-770M | 770M | 7M x 5.3 epochs | 8192 | 1280 | 36 | 20 |
41
+ | MuPT-v0-8192-1.3B | 1.3B | 7M x 5.8 epochs | 8192 | 1536 | 48 | 24 |
42
+
43
+ ## Weight Conversion
44
+
45
+ The checkpoint we released is in [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) format, you can use the checkpoint directly in Megatron-LM for continue training or fine-tuning.
46
+ We also provide a script to convert the checkpoints to Huggingface format:
47
+
48
+ ```shell
49
+ export PYTHONPATH=/path/to/megatron-lm
50
+ HF_SAVE_ROOT=/path/to/save/huggingface/checkpoint
51
+
52
+ ITER=023000
53
+
54
+ MEGATRON_PATH=/path/to/intermediate/checkpoint/iter_00${ITER}
55
+ HF_SAVE_PATH=${HF_SAVE_ROOT}/MuPT-v0-1.3B-8192-iter${ITER}
56
+
57
+ python convert_llama_megatron_hf.py \
58
+ --input-dir ${MEGATRON_PATH} \
59
+ --output-dir ${HF_SAVE_PATH} \
60
+ --vocab-size 50000
61
+ ```
62
+
63
+ ## Model Usage
64
+
65
+ There are several ways to use our pre-trained MuPT models, we now the usage based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main).
66
+
67
+ Before starting, make sure you have setup the relevant environment and codebase.
68
+
69
+ ```shell
70
+ # pull Megatron-LM codebase
71
+ mkdir -p /path/to/workspace && cd /path/to/workspace
72
+ git clone https://github.com/NVIDIA/Megatron-LM.git
73
+
74
+ # download the pre-trained MuPT models checkpoint and vocab files from Huggingface page
75
+ mkdir -p /models/MuPT_v0_8192_1.3B && cd /models/MuPT_v0_8192_1.3B
76
+ wget -O model_optim_rng.pt https://huggingface.co/m-a-p/MuPT_v0_8192_1.3B/resolve/main/model_optim_rng.pt?download=true
77
+ wget -O newline.vocab https://huggingface.co/m-a-p/MuPT_v0_8192_1.3B/resolve/main/newline.vocab?download=true
78
+ wget -O newline.txt https://huggingface.co/m-a-p/MuPT_v0_8192_1.3B/resolve/main/newline.txt?download=true
79
+ ```
80
+
81
+ We recommend using the latest version of [NGC's PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for MuPT inference. See more details in [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/tree/main)
82
+
83
+ ```shell
84
+ # pull the latest NGC's PyTorch container, mount the workspace directory and enter the container
85
+ docker run --gpus all -it --name megatron --shm-size=16g -v $PWD:/workspace -p 5000:5000 nvcr.io/nvidia/pytorch:23.11-py3 /bin/bash
86
+ ```
87
+
88
+ Once you enter the container, you can start a REST server for inference.
89
+
90
+ <details>
91
+ <summary>Click to expand the example script</summary>
92
+
93
+ #!/bin/bash
94
+ # This example will start serving the 1.3B model.
95
+ export CUDA_DEVICE_MAX_CONNECTIONS=1
96
+
97
+ DISTRIBUTED_ARGS="--nproc_per_node 1 \
98
+ --nnodes 1 \
99
+ --node_rank 0 \
100
+ --master_addr localhost \
101
+ --master_port 6000"
102
+
103
+ CHECKPOINT=/path/to/model/checkpoint/folder
104
+ VOCAB_FILE=/path/to/vocab/file
105
+ MERGE_FILE=/path/to/merge/file
106
+
107
+ MODEL_SIZE="1.3B"
108
+ if [[ ${MODEL_SIZE} == "110M" ]]; then HIDDEN_SIZE=768; NUM_HEAD=12; NUM_QUERY_GROUP=12; NUM_LAYERS=12; FFN_HIDDEN_SIZE=3072; NORM_EPS=1e-5;
109
+ elif [[ ${MODEL_SIZE} == "345M" ]]; then HIDDEN_SIZE=1024; NUM_HEAD=16; NUM_QUERY_GROUP=16; NUM_LAYERS=24; FFN_HIDDEN_SIZE=4096; NORM_EPS=1e-5;
110
+ elif [[ ${MODEL_SIZE} == "770M" ]]; then HIDDEN_SIZE=1280; NUM_HEAD=20; NUM_QUERY_GROUP=20; NUM_LAYERS=36; FFN_HIDDEN_SIZE=5120; NORM_EPS=1e-5;
111
+ elif [[ ${MODEL_SIZE} == "1.3B" ]]; then HIDDEN_SIZE=1536; NUM_HEAD=24; NUM_QUERY_GROUP=24; NUM_LAYERS=48; FFN_HIDDEN_SIZE=6144; NORM_EPS=1e-5;
112
+ else echo "invalid MODEL_SIZE: ${MODEL_SIZE}"; exit 1
113
+ fi
114
+ MAX_SEQ_LEN=8192
115
+ MAX_POSITION_EMBEDDINGS=8192
116
+
117
+ pip install flask-restful
118
+
119
+ torchrun $DISTRIBUTED_ARGS tools/run_text_generation_server.py \
120
+ --tensor-model-parallel-size 1 \
121
+ --pipeline-model-parallel-size 1 \
122
+ --num-layers ${NUM_LAYERS} \
123
+ --hidden-size ${HIDDEN_SIZE} \
124
+ --ffn-hidden-size ${FFN_HIDDEN_SIZE} \
125
+ --load ${CHECKPOINT} \
126
+ --group-query-attention \
127
+ --num-query-groups ${NUM_QUERY_GROUP} \
128
+ --position-embedding-type rope \
129
+ --num-attention-heads ${NUM_HEAD} \
130
+ --max-position-embeddings ${MAX_POSITION_EMBEDDINGS} \
131
+ --tokenizer-type GPT2BPETokenizer \
132
+ --normalization RMSNorm \
133
+ --norm-epsilon ${NORM_EPS} \
134
+ --make-vocab-size-divisible-by 1 \
135
+ --swiglu \
136
+ --use-flash-attn \
137
+ --bf16 \
138
+ --micro-batch-size 1 \
139
+ --disable-bias-linear \
140
+ --no-bias-gelu-fusion \
141
+ --untie-embeddings-and-output-weights \
142
+ --seq-length ${MAX_SEQ_LEN} \
143
+ --vocab-file $VOCAB_FILE \
144
+ --merge-file $MERGE_FILE \
145
+ --attention-dropout 0.0 \
146
+ --hidden-dropout 0.0 \
147
+ --weight-decay 1e-1 \
148
+ --clip-grad 1.0 \
149
+ --adam-beta1 0.9 \
150
+ --adam-beta2 0.95 \
151
+ --adam-eps 1e-8 \
152
+ --seed 42
153
+
154
+ </details>
155
+
156
+
157
+ Use CURL to query the server directly, note that the newline token `\n` is represented by `<n>` in the vocabulary, so we need to replace the newline token with `<n>` in both the prompt and the generated tokens.
158
+
159
+ ```shell
160
+ curl 'http://localhost:6000/api' -X 'PUT' -H 'Content-Type: application/json; charset=UTF-8' -d '{"prompts":["X:1<n>L:1/8<n>Q:1/8=200<n>M:4/4<n>K:Gmin<n>|:\"Gm\" BGdB"], "tokens_to_generate":4096}'
161
+ ```
162
+ Processed Output:
163
+ ```shell
164
+ X:1
165
+ L:1/8
166
+ Q:1/8=200
167
+ M:4/4<n>K:Gmin
168
+ |:\"Gm\" BGdB fdBG |\"F\" AFcF dFcF |\"Gm\" BGdG gFBF |\"F\" AFAG AF F2 |\"Gm\" BGBd fffd |\"F\" cdcB cdeg |
169
+ \"Gm\" fdcB\"Eb\" AFcA |1 BGFG\"F\" AFGc :|2 BGFG\"F\" AF F2 ||
170
+ ```
171
+
172
+ Once you encode the generated tokens into audio, you will hear the following music.
173
+
174
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/640701cb4dc5f2846c91d4eb/gnBULaFjcUyXYzzIwXLZq.mpga"></audio>