benmael commited on
Commit
845ac2d
1 Parent(s): dc47f1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -1,3 +1,98 @@
1
  ---
2
  license: ms-pl
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: ms-pl
3
  ---
4
+
5
+ ###### [Overview](#CLAP) | [Setup](#Setup) | [CLAP weights](#CLAP-weights) | [Usage](#Usage) | [Examples](#Examples) | [Citation](#Citation)
6
+
7
+ # CLAP
8
+
9
+ CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic concepts from natural language supervision and enables “Zero-Shot” inference. The model has been extensively evaluated in 26 audio downstream tasks achieving SoTA in several of them including classification, retrieval, and captioning.
10
+
11
+ <img width="832" alt="clap_diagrams" src="docs/clap2_diagram.png">
12
+
13
+ ## Setup
14
+
15
+ First, install python 3.8 or higher (3.11 recommended). Then, install CLAP using either of the following:
16
+
17
+ ```shell
18
+ # Install pypi pacakge
19
+ pip install msclap
20
+
21
+ # Or Install latest (unstable) git source
22
+ pip install git+https://github.com/microsoft/CLAP.git
23
+ ```
24
+
25
+ ## NEW CLAP weights
26
+ CLAP weights: versions _2022_, _2023_, and _clapcap_
27
+
28
+ _clapcap_ is the audio captioning model that uses the 2023 encoders.
29
+
30
+ ## Usage
31
+
32
+ CLAP code is in https://github.com/microsoft/CLAP
33
+
34
+ - Zero-Shot Classification and Retrieval
35
+ ```python
36
+ from msclap import CLAP
37
+
38
+ # Load model (Choose between versions '2022' or '2023')
39
+ clap_model = CLAP("<PATH TO WEIGHTS>", version = '2023', use_cuda=False)
40
+
41
+ # Extract text embeddings
42
+ text_embeddings = clap_model.get_text_embeddings(class_labels: List[str])
43
+
44
+ # Extract audio embeddings
45
+ audio_embeddings = clap_model.get_audio_embeddings(file_paths: List[str])
46
+
47
+ # Compute similarity between audio and text embeddings
48
+ similarities = clap_model.compute_similarity(audio_embeddings, text_embeddings)
49
+ ```
50
+
51
+ - Audio Captioning
52
+ ```python
53
+ from msclap import CLAP
54
+
55
+ # Load model (Choose version 'clapcap')
56
+ clap_model = CLAP("<PATH TO WEIGHTS>", version = 'clapcap', use_cuda=False)
57
+
58
+ # Generate audio captions
59
+ captions = clap_model.generate_caption(file_paths: List[str])
60
+ ```
61
+
62
+
63
+ ## Citation
64
+
65
+ Kindly cite our work if you find it useful.
66
+
67
+ [CLAP: Learning Audio Concepts from Natural Language Supervision](https://ieeexplore.ieee.org/abstract/document/10095889)
68
+ ```
69
+ @inproceedings{CLAP2022,
70
+ title={Clap learning audio concepts from natural language supervision},
71
+ author={Elizalde, Benjamin and Deshmukh, Soham and Al Ismail, Mahmoud and Wang, Huaming},
72
+ booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
73
+ pages={1--5},
74
+ year={2023},
75
+ organization={IEEE}
76
+ }
77
+ ```
78
+
79
+ [Natural Language Supervision for General-Purpose Audio Representations](https://arxiv.org/abs/2309.05767)
80
+ ```
81
+ @misc{CLAP2023,
82
+ title={Natural Language Supervision for General-Purpose Audio Representations},
83
+ author={Benjamin Elizalde and Soham Deshmukh and Huaming Wang},
84
+ year={2023},
85
+ eprint={2309.05767},
86
+ archivePrefix={arXiv},
87
+ primaryClass={cs.SD},
88
+ url={https://arxiv.org/abs/2309.05767}
89
+ }
90
+ ```
91
+
92
+ ## Trademarks
93
+
94
+ This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
95
+ trademarks or logos is subject to and must follow
96
+ [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
97
+ Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
98
+ Any use of third-party trademarks or logos are subject to those third-party's policies.