sander-wood commited on
Commit
069ff3f
1 Parent(s): 6953aff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -1,3 +1,83 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - sander-wood/wikimusictext
5
+ language:
6
+ - en
7
+ pipeline_tag: feature-extraction
8
+ tags:
9
+ - music
10
  ---
11
+
12
+ # CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
13
+
14
+ ## Model description
15
+
16
+ CLaMP is a pre-training approach for cross-modal symbolic music information retrieval that uses contrastive learning to learn cross-modal representations between natural language and symbolic music. The model consists of a music encoder and a text encoder that are jointly trained with a contrastive loss. To pre-train the model, a large dataset of 1.4 million music-text pairs was collected. The model employs data augmentation techniques such as text dropout and bar patching to efficiently represent music data, reducing sequence length to less than 10%.
17
+
18
+ <br>
19
+ <center><img src="clamp.png" alt="clamp" style="zoom:80%"></center>
20
+ <br>
21
+
22
+ To enhance the music encoder's comprehension of musical context and structure, the model uses a masked music model pre-training objective. CLaMP also integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models.
23
+
24
+ To evaluate semantic search and music classification, the model uses a publicly released dataset called [WikiMusicText](https://huggingface.co/datasets/sander-wood/wikimusictext) (WikiMT), which consists of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description. In comparison to state-of-the-art models that require fine-tuning, zero-shot CLaMP demonstrated comparable or superior performance on score-oriented datasets.
25
+
26
+ ## Cross-Modal Symbolic MIR
27
+
28
+ CLaMP is capable of aligning symbolic music and natural language, which can be used for various cross-modal retrieval tasks, including semantic search and zero-shot classification for symbolic music.
29
+
30
+ Semantic search is a technique for retrieving music by open-domain queries, which differs from traditional keyword-based searches that depend on exact matches or meta-information. This involves two steps: 1) extracting music features from all scores in the library, and 2) transforming the query into a text feature. By calculating the similarities between the text feature and the music features, it can efficiently locate the score that best matches the user's query in the library.
31
+
32
+ Zero-shot classification refers to the classification of new items into any desired label without the need for training data. It involves using a prompt template to provide context for the text encoder. For example, a prompt such as "<i>This piece of music is composed by {composer}.</i>" is utilized to form input texts based on the names of candidate composers. The text encoder then outputs text features based on these input texts. Meanwhile, the music encoder extracts the music feature from the unlabelled target symbolic music. By calculating the similarity between each candidate text feature and the target music feature, the label with the highest similarity is chosen as the predicted one.
33
+
34
+ ## Intended uses:
35
+
36
+ 1. Semantic search and zero-shot classification for score-oriented symbolic music datasets.
37
+ 2. Cross-modal representation learning between natural language and symbolic music.
38
+ 3. Enabling research in music analysis, retrieval, and generation.
39
+ 4. Building innovative systems and applications that integrate music and language.
40
+
41
+ ## Limitations:
42
+
43
+ 1. CLaMP's current version has limited comprehension of performance MIDI.
44
+ 2. The model may not perform well on tasks outside its pre-training scope.
45
+ 3. It may require fine-tuning for some specific tasks.
46
+
47
+ ### How to use
48
+
49
+ To use CLaMP, you can follow these steps:
50
+
51
+ 1. Clone the CLaMP repository by running the following command in your terminal:
52
+ ```
53
+ git clone https://github.com/microsoft/muzic.git
54
+ ```
55
+ This will create a local copy of the repository on your computer.
56
+
57
+ 2. Navigate to the CLaMP directory by running the following command:
58
+ ```
59
+ cd muzic/clamp
60
+ ```
61
+
62
+ 3. Install the required dependencies by running the following command:
63
+ ```
64
+ pip install -r requirements.txt
65
+ ```
66
+
67
+ 4. If you are performing a music query, save your query as `inference/music_query.mxl`. For music keys, ensure that all the music files are in the MusicXML (.mxl) format, and are saved in the `inference/music_keys` folder.
68
+
69
+ 5. If you are performing a text query, save your query as `inference/text_query.txt`. For text keys, save all the keys in the `inference/text_keys.txt` file, where each line corresponds to a key.
70
+
71
+ 6. Run the following command to perform the query:
72
+ ```
73
+ python clamp.py -clamp_model_name [MODEL NAME] -query_modal [QUERY MODAL] -key_modal [KEY MODAL] -top_n [NUMBER OF RESULTS]
74
+ ```
75
+ Replace [MODEL NAME] with the name of the CLaMP model you want to use (either `sander-wood/clamp-small-512` or `sander-wood/clamp-small-1024`), [QUERY MODAL] with either `music` or `text` to indicate the type of query you want to perform, [KEY MODAL] with either `music` or `text` to indicate the type of key modal you want to use, and [NUMBER OF RESULTS] with the number of top results you want to return.
76
+
77
+ For example, to perform semantic music search with the `sander-wood/clamp-small-512` model and return the top 5 results, run:
78
+ ```
79
+ python clamp.py -clamp_model_name sander-wood/clamp-small-512 -query_modal text -key_modal music -top_n 5
80
+ ```
81
+ Note that the first time you run the CLaMP script, it will automatically download the model checkpoint from Hugging Face. This may take a few minutes, depending on your internet speed.
82
+
83
+ 7. After running the command, the script will generate a list of the top results for the given query. Each result correspond to a music file in the `music_keys` folder or a line in the `text_keys.txt` file, depending on the type of key modal you used.