morenolq commited on
Commit
c56412d
1 Parent(s): 80d46e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+ pipeline_tag: audio-classification
4
+ tags:
5
+ - music
6
+ - audio
7
  ---
8
+
9
+ # Model Card: Pre-trained Audio Representation Models on AudioSet
10
+
11
+ ## Overview
12
+
13
+ This model card presents information about pre-trained audio representation models released by ALM. These models are pre-trained on the full AudioSet dataset and are intended for general-purpose Audio Representation Learning (ARL) tasks.
14
+
15
+ ## Models
16
+
17
+ ### 1. [ALM/hubert-base-audioset](https://huggingface.co/ALM/hubert-base-audioset)
18
+
19
+ - **Architecture**: HuBERT (Hubert-Base) transformer-based model
20
+ - **Description**: This model is based on the HuBERT architecture, pre-trained on the full AudioSet dataset.
21
+
22
+ ### 2. [ALM/hubert-large-audioset](https://huggingface.co/ALM/hubert-large-audioset)
23
+
24
+ - **Architecture**: HuBERT (Hubert-Large) transformer-based model
25
+ - **Description**: Similar to the hubert-base-audioset model, this variant is larger in size, providing increased capacity for capturing audio representations from the full AudioSet dataset.
26
+
27
+ ### 3. [ALM/wav2vec2-base-audioset](https://huggingface.co/ALM/wav2vec2-base-audioset)
28
+
29
+ - **Architecture**: Wav2Vec 2.0 (Wav2Vec2-Base) transformer-based model
30
+ - **Description**: This model is based on the Wav2Vec 2.0 architecture, trained on the full AudioSet dataset using SSL with CPC. It offers a different approach to audio representation learning compared to the HuBERT models.
31
+
32
+ ### 4. [ALM/wav2vec2-large-audioset](https://huggingface.co/ALM/wav2vec2-large-audioset)
33
+
34
+ - **Architecture**: Wav2Vec 2.0 (Wav2Vec2-Large) transformer-based model
35
+ - **Description**: Similar to the wav2vec2-base-audioset model, this variant is larger in size, providing enhanced capacity for learning audio representations from the full AudioSet dataset.
36
+
37
+ ## Intended Use
38
+
39
+ These pre-trained models are intended for a wide range of ARL tasks, including but not limited to speech recognition, music classification, and acoustic event detection. They serve as powerful tools for feature extraction and can be fine-tuned on task-specific datasets for downstream applications.
40
+ It's important to note that while these models offer versatility across various audio domains, their performance in speech-related tasks may be relatively lower compared to specialized models such as the original Wav2Vec and HuBERT models.
41
+ This is due to the diverse nature of the AudioSet dataset used for pre-training, which includes a wide range of audio sources beyond speech.
42
+
43
+ ## Limitations and Considerations
44
+
45
+ - The models are pre-trained on the full AudioSet dataset, which may not cover all possible audio domains comprehensively.
46
+ - Fine-tuning on domain-specific data may be necessary to achieve optimal performance for certain tasks.
47
+ - Computational resources may be required for deploying and fine-tuning these models, especially the larger variants.
48
+
49
+ ## Citation
50
+
51
+ If you use these pre-trained models in your work, please cite the following:
52
+
53
+ ```bib
54
+ @article{ARCH,
55
+ title={Benchmarking Representations for Speech, Music, and Acoustic Events},
56
+ author={La Quatra, Moreno and Koudounas, Alkis and Vaiani, Lorenzo and Baralis, Elena and Garza, Paolo and Cagliero, Luca, and Siniscalchi, Sabato Marco},
57
+ year={2024},
58
+ booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
59
+ }
60
+ ```