a43992899 commited on
Commit
8087657
β€’
1 Parent(s): dc08d7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +172 -13
README.md CHANGED
@@ -6,12 +6,13 @@ metrics:
6
  - accuracy
7
  pipeline_tag: text-generation
8
  ---
9
- # 🎼 ChatMusician: Fostering Intrinsic Musical Abilities Into LLM
10
 
11
- [**🌐 DemoPage**](https://ezmonyi.github.io/ChatMusician/) | [**πŸ€— Dataset**](https://huggingface.co/datasets/m-a-p/MusicPile) | [**πŸ€— Benchmark**](https://huggingface.co/datasets/m-a-p/MusicTheoryBench) | [**πŸ“– arXiv**](http://arxiv.org/abs/2402.16153) | [**Code**](https://github.com/hf-lin/ChatMusician) | [**Model**](https://huggingface.co/m-a-p/ChatMusician)
12
 
13
  ## πŸ””News
14
- - **πŸ”₯[2023-12-10]: The release of ChatMusician's demo, code, model, data, and benchmark. πŸ˜†**
 
15
  - [2023-11-30]: Checkout another awesome project [MMMU](https://huggingface.co/datasets/MMMU/MMMU/) that includes multimodal music reasoning.
16
 
17
  ## Introduction
@@ -24,30 +25,188 @@ It is based on continual pre-training and finetuning LLaMA2 on a text-compatible
24
  margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. Code, data, model, and benchmark are open-sourced.
25
 
26
  <!-- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5fd6f670053c8345eddc1b68/8NSONUjIF7KGUCfwzPCd9.mpga"></audio> -->
 
27
  [![Demo Video](chatmusician_demo.png)](https://youtu.be/zt3l49K55Io)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- **ChatMusician-Base is a pretrained model. [ChatMusician](https://huggingface.co/m-a-p/ChatMusician) is recommended for producing symbolic music.**
 
 
30
 
31
- ## Training Data
 
 
 
 
 
32
 
33
- ChatMusician-Base is pretrained on the πŸ€— [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile), which is the first pretraining corpus for **developing musical abilities** in large language models. Check out the dataset card for more details.
 
34
 
35
- ## Training Procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
- We initialized a fp16-precision ChatMusician-Base from the LLaMA2-7B-Base weights, and applied a continual pre-training plus fine-tuning pipeline. LoRA adapters were integrated into the attention and MLP layers, with additional training on embeddings and all linear layers. The maximum sequence length
38
- was 2048. We utilized 16 80GB-A800 GPUs for one epoch pre-training. DeepSpeed was employed for memory efficiency, and the AdamW optimizer was used with a 1e-4 learning rate and a 5% warmup cosine scheduler. Gradient clipping was set at 1.0. The LoRA parameters dimension, alpha, and dropout were set to 64, 16, and 0.1, with a batch size of 8.
39
 
40
  ## Evaluation
41
 
42
  1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench). The following figure is zero-shot accuracy on MusicTheoryBench.
43
  We included GPT-3.5, GPT-4, LLaMA2-7B-Base, ChatMusician-Base, and ChatMusician. The blue bar represents the performance on the music knowledge metric, and the red bar represents the music reasoning metric. The dashed line corresponds to a random baseline, with a score of 25%.![MusicTheoryBench_result](./MusicTheoryBench_result_plt.png)
44
- 2. General language abilities of ChatMusician are evaluated on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
45
 
46
 
47
- ## Limitations
48
 
49
- The current iteration of ChatMusician predominantly generates music in the style of Irish music, attributable to a significant portion of the dataset being sourced from this genre.
50
- The model exhibits hallucinations and faces limitations in supporting open-ended music generation tasks due to the lack of diversity in handcrafted music instructions.
51
 
52
  ## Citation
53
  If you find our work helpful, feel free to give us a cite.
 
6
  - accuracy
7
  pipeline_tag: text-generation
8
  ---
9
+ # 🎼 ChatMusician: Understanding and Generating Music Intrinsically with LLM
10
 
11
+ [**🌐 DemoPage**](https://ezmonyi.github.io/ChatMusician/) | [**πŸ€— Dataset**](https://huggingface.co/datasets/m-a-p/MusicPile) | [**πŸ€— Benchmark**](https://huggingface.co/datasets/m-a-p/MusicTheoryBench) | [**πŸ“– arXiv**](http://arxiv.org/abs/2402.16153) | [πŸ’» **Code**](https://github.com/hf-lin/ChatMusician) | [**πŸ€– Chat Model**](https://huggingface.co/m-a-p/ChatMusician)
12
 
13
  ## πŸ””News
14
+ - **πŸ”₯[2024-2-28]: The release of ChatMusician's demo, code, model, data, and benchmark. πŸ˜†**
15
+ - [2024-2-28]: ChatMusician uses a fast symbolic music processing and rendering library, `symusic`. Developed by Yikai-Liao, lzqlzzq and Natooz. Find the project on Github: https://github.com/Yikai-Liao/symusic
16
  - [2023-11-30]: Checkout another awesome project [MMMU](https://huggingface.co/datasets/MMMU/MMMU/) that includes multimodal music reasoning.
17
 
18
  ## Introduction
 
25
  margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. Code, data, model, and benchmark are open-sourced.
26
 
27
  <!-- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/5fd6f670053c8345eddc1b68/8NSONUjIF7KGUCfwzPCd9.mpga"></audio> -->
28
+
29
  [![Demo Video](chatmusician_demo.png)](https://youtu.be/zt3l49K55Io)
30
+ <!-- [![ChatMusician Introduction](http://img.youtube.com/vi/zt3l49K55Io/0.jpg))](http://www.youtube.com/watch?v=zt3l49K55Io "ChatMusician Introduction") -->
31
+ <!-- <iframe width="787" height="528" src="https://www.youtube.com/embed/zt3l49K55Io" title="ChatMusician: Fostering Intrinsic Musical Abilities Into LLM" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
32
+
33
+ ## Usage
34
+
35
+ You can use the models through Huggingface's Transformers library. Check our Github repo for more advanced use: [https://github.com/hf-lin/ChatMusician](https://github.com/hf-lin/ChatMusician) -->
36
+
37
+ ## CLI demo
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
40
+ import torch
41
+ import re
42
+ from string import Template
43
+ prompt_template = Template("Human: ${inst} </s> Assistant: ")
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained("m-a-p/ChatMusician", trust_remote_code=True)
46
+ # you may replace "m-a-p/ChatMusician-Base" with "m-a-p/ChatMusician", since the base model may not follow instructions.
47
+ model = AutoModelForCausalLM.from_pretrained("m-a-p/ChatMusician-Base", torch_dtype=torch.float16, device_map="cuda", resume_download=True).eval()
48
+
49
+ generation_config = GenerationConfig(
50
+ temperature=0.2,
51
+ top_k=40,
52
+ top_p=0.9,
53
+ do_sample=True,
54
+ num_beams=1,
55
+ repetition_penalty=1.1,
56
+ min_new_tokens=10,
57
+ max_new_tokens=1536
58
+ )
59
+
60
+ instruction = """Develop a musical piece using the given chord progression.
61
+ 'Dm', 'C', 'Dm', 'Dm', 'C', 'Dm', 'C', 'Dm'
62
+ """
63
+
64
+ prompt = prompt_template.safe_substitute({"inst": instruction})
65
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
66
+ response = model.generate(
67
+ input_ids=inputs["input_ids"].to(model.device),
68
+ attention_mask=inputs['attention_mask'].to(model.device),
69
+ eos_token_id=tokenizer.eos_token_id,
70
+ generation_config=generation_config,
71
+ )
72
+ response = tokenizer.decode(response[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
73
+ print(response)
74
+
75
+ # to render abc notation, you need to install symusic
76
+ # pip install symusic
77
+ from symusic import Score, Synthesizer, BuiltInSF3, dump_wav
78
+
79
+ abc_pattern = r'(X:\d+\n(?:[^\n]*\n)+)'
80
+ abc_notation = re.findall(abc_pattern, response+'\n')[0]
81
+ s = Score.from_abc(abc_notation)
82
+ audio = Synthesizer().render(s, stereo=True)
83
+ dump_wav('cm_music_piece.wav', audio, sample_rate=44100, use_int16=True)
84
+ ```
85
 
86
+ ## Chat demo
87
+ ChatMusician supports gradio web demo and multi-turn dialogue, please visit our [github](https://github.com/hf-lin/ChatMusician) for more details.
88
+ Our web demo also supports rendering ABC scores into images.
89
 
90
+ ## Limitations
91
+ - The model currently only supports strict format and close-ended instructions for the music tasks. If we have more funding, we plan to create a more diverse multi-turn music instruction chat data for better generalization.
92
+ - The model suffers from hallucinations, and shouldn't be used for music education. It could be improved by feeding more music textbooks, blogs, etc. And RLHF may help, too.
93
+ - A large portion of the training data is in the style of Irish music. If possible, the community should develop a converter between performance midi and ABC scores, so that we can include more established midi datasets.
94
+ - The MusicThoeryBench results reported in the paper are obtained with perplexity mode. Direct generation may result in a worse performance.
95
+ - We observe that using the current version of training data, ChatMusician presents a weak in-context-learning and chain-of-thoughts ability. The community should work on improving the music data quality.
96
 
97
+ ## Example Stable Prompts
98
+ We provide some of the prompts that are tested to be stable. For more prompts, please check πŸ€— [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile).
99
 
100
+ ### Function: Chord Conditioned Music Generation
101
+ ```
102
+ Develop a musical piece using the given chord progression.
103
+ 'Dm', 'C', 'Dm', 'Dm', 'C', 'Dm', 'C', 'Dm'
104
+ ```
105
+
106
+ ### Function: Text2music
107
+ ```
108
+ Develop a tune influenced by Bach's compositions.
109
+ ```
110
+ ```
111
+ Using ABC notation, recreate the given text as a musical score.
112
+ Meter C
113
+ Notes The parts are commonly interchanged.
114
+ Transcription 1997 by John Chambers
115
+ Key D
116
+ Note Length 1/8
117
+ Rhythm reel
118
+ ```
119
+
120
+ ### Function: Melody Harmonization
121
+
122
+ ```
123
+ Construct smooth-flowing chord progressions for the supplied music.
124
+
125
+ |: BA | G2 g2"^(C)" edeg | B2 BA"^(D7)" BcBA | G2 g2 edeg | dBAG A2 BA |
126
+ G2 g2"^(C)" edeg | B2 BA B2 d2 | e2 ef e2 (3def | gedB A2 :: BA | G2 BG dGBe |
127
+ dBBA"^(D7)" B3 A | G2 BG dGBe | dBAG A4 | G2 BG dGBe | dBBA B3 d |
128
+ e2 ef e2 (3def | gedB A2 :|
129
+ ```
130
+ ```
131
+ Develop a series of chord pairings that amplify the harmonious elements in the given music piece.
132
+
133
+ E |: EAA ABc | Bee e2 d | cBA ABc | BEE E2 D | EAA ABc | Bee e2 d |
134
+ cBA ^GAB |1 A2 A A2 E :|2 A2 A GAB || c3 cdc | Bgg g2 ^g | aed cBA |
135
+ ^GAB E^F^G | A^GA BAB | cde fed | cBA ^GAB |1 A2 A GAB :|2 \n A3 A2 ||
136
+ ```
137
+
138
+ ### Function: Musical Form Conditioned Music Generation
139
+
140
+ ```
141
+ Develop a composition by incorporating elements from the given melodic structure.
142
+
143
+ Ternary, Sectional: Verse/Chorus/Bridge
144
+ ```
145
+
146
+ ### Function: Motif and Form Conditioned Music Generation
147
+
148
+ ```
149
+ Create music by incorporating the assigned motif into the predetermined musical arrangement.
150
+
151
+ Musical Form Input: Only One Section
152
+
153
+ ABC Notation Music Input:
154
+ X:1
155
+ L:1/8
156
+ M:9/8
157
+ K:Emin
158
+ vB2 E E2 F G2 A
159
+ ```
160
+
161
+ ### Function: Music Understanding
162
+
163
+ ```
164
+ Investigate the aspects of this musical work and convey its structural organization using suitable musical words.
165
+
166
+ X:1
167
+ L:1/8
168
+ M:2/2
169
+ K:G
170
+ G2 dG BGdG | G2 dc BAGB | A2 eA cAeA | A2 ed cAFA |
171
+ G2 dG BGdG | G2 dc BAGB | ABcd efge |1 aged cAFA :|2
172
+ aged ^cdef |: g3 f g2 ef | gedc BA G2 | eaag agea |
173
+ aged ^cdef | g3 f g2 ef |gedc BAGB | ABcd efge |1
174
+ aged ^cdef :|2 aged cAFA |:"^variations:" G2 BG dGBA |
175
+ G2 dG BAGB | A2 cA eAcA | A2 ed cAFA | G2 BG dGBA |
176
+ G2 dc BAGB | ABcd efge |1 aged cAFA :|2 aged ^cdef |:
177
+ g2 af g2 ef | gedc BAGB | Aaag ageg | aged ^cdef |
178
+ gbaf g2 ef | gedc BAGB | ABcd efge |1
179
+ aged ^cdef :|2 aged cAFA ||
180
+ ```
181
+
182
+ ```
183
+ Analyze the musical work and pinpoint the consistent melodic element in every section.
184
+
185
+ X:1
186
+ L:1/8
187
+ M:4/4
188
+ K:G
189
+ ge | d2 G2 cBAG | d2 G2 cBAG | e2 A2 ABcd | edcB A2 Bc |
190
+ d2 cB g2 fe | edcB cBAG | BAGE DEGA | B2 G2 G2 :: ga |
191
+ b2 gb a2 fa | g2 eg edcB | e2 A2 ABcd | edcB A2 ga |
192
+ b2 gb a2 fa | g2 eg edcB | cBAG DEGA | B2 G2 G2 :|
193
+ ```
194
+
195
+ ## Training Data
196
+
197
+ ChatMusician is pretrained on the πŸ€— [MusicPile](https://huggingface.co/datasets/m-a-p/MusicPile), which is the first pretraining corpus for **developing musical abilities** in large language models. Check out the dataset card for more details.
198
+ And supervised finetuned on 1.1M samples(2:1 ratio between music scores
199
+ and music knowledge & music summary data) from MusicPile. Check our [paper](http://arxiv.org/abs/2402.16153) for more details.
200
 
 
 
201
 
202
  ## Evaluation
203
 
204
  1. Music understanding abilities are evaluated on the [MusicTheoryBench](https://huggingface.co/datasets/m-a-p/MusicTheoryBench). The following figure is zero-shot accuracy on MusicTheoryBench.
205
  We included GPT-3.5, GPT-4, LLaMA2-7B-Base, ChatMusician-Base, and ChatMusician. The blue bar represents the performance on the music knowledge metric, and the red bar represents the music reasoning metric. The dashed line corresponds to a random baseline, with a score of 25%.![MusicTheoryBench_result](./MusicTheoryBench_result_plt.png)
206
+ 2. General language abilities of ChatMusician are evaluated on the [Massive Multitask Language Understanding (MMLU) dataset](https://huggingface.co/datasets/lukaemon/mmlu).
207
 
208
 
 
209
 
 
 
210
 
211
  ## Citation
212
  If you find our work helpful, feel free to give us a cite.