topel commited on
Commit
a0b8f57
1 Parent(s): 9040c27

Update Readme

Browse files
Files changed (1) hide show
  1. README.md +11 -14
README.md CHANGED
@@ -6,13 +6,9 @@ tags:
6
  - audio embeddings
7
  - convnext-audio
8
  - audioset
9
- inference: false
10
- extra_gated_prompt: "The collected information will help acquire a better knowledge of who is using our audio event tools. If relevant, please cite our Interspeech 2023 paper."
11
- extra_gated_fields:
12
- Company/university: text
13
- Website: text
14
  ---
15
- ConvNeXt-Tiny-AT is an audio tagging CNN model, trained on AudioSet (balanced+unbalanced subsets). It reached 0.471 mAP on the test set.
 
16
 
17
  The model expects as input audio files of duration 10 seconds, and sample rate 32kHz.
18
  It provides logits and probabilities for the 527 audio event tags of AudioSet (see http://research.google.com/audioset/index.html).
@@ -23,8 +19,6 @@ The scene embedding is obtained from the frame-level embeddings, on which mean p
23
 
24
  This code is based on our repo: https://github.com/topel/audioset-convnext-inf
25
 
26
- Note that the checkpoint is also available on Zenodo: https://zenodo.org/record/8020843/files/convnext_tiny_471mAP.pth?download=1
27
-
28
 
29
  ```bash
30
  pip install git+https://github.com/topel/audioset-convnext-inf@pip-install
@@ -35,10 +29,6 @@ pip install git+https://github.com/topel/audioset-convnext-inf@pip-install
35
  Below is an example of how to instantiate our model convnext_tiny_471mAP.pth
36
 
37
  ```python
38
- # 1. visit hf.co/topel/ConvNeXt-Tiny-AT and accept user conditions
39
- # 2. visit hf.co/settings/tokens to create an access token
40
- # 3. instantiate pretrained model
41
-
42
  import os
43
  import numpy as np
44
  import torch
@@ -69,7 +59,6 @@ Output:
69
  ## Inference: get logits and probabilities
70
 
71
  ```python
72
-
73
  sample_rate = 32000
74
  audio_target_length = 10 * sample_rate # 10 s
75
 
@@ -140,8 +129,16 @@ Output:
140
  Frame-level embeddings, shape: torch.Size([1, 768, 31, 7])
141
  ```
142
 
 
 
 
 
 
 
 
 
143
 
144
- ## Citation
145
 
146
  Cite as: Pellegrini, T., Khalfaoui-Hassani, I., Labbé, E., Masquelier, T. (2023) Adapting a ConvNeXt Model to Audio Classification on AudioSet. Proc. INTERSPEECH 2023, 4169-4173, doi: 10.21437/Interspeech.2023-1564
147
 
 
6
  - audio embeddings
7
  - convnext-audio
8
  - audioset
 
 
 
 
 
9
  ---
10
+
11
+ **ConvNeXt-Tiny-AT** is an audio tagging CNN model, trained on **AudioSet** (balanced+unbalanced subsets). It reached 0.471 mAP on the test set.
12
 
13
  The model expects as input audio files of duration 10 seconds, and sample rate 32kHz.
14
  It provides logits and probabilities for the 527 audio event tags of AudioSet (see http://research.google.com/audioset/index.html).
 
19
 
20
  This code is based on our repo: https://github.com/topel/audioset-convnext-inf
21
 
 
 
22
 
23
  ```bash
24
  pip install git+https://github.com/topel/audioset-convnext-inf@pip-install
 
29
  Below is an example of how to instantiate our model convnext_tiny_471mAP.pth
30
 
31
  ```python
 
 
 
 
32
  import os
33
  import numpy as np
34
  import torch
 
59
  ## Inference: get logits and probabilities
60
 
61
  ```python
 
62
  sample_rate = 32000
63
  audio_target_length = 10 * sample_rate # 10 s
64
 
 
129
  Frame-level embeddings, shape: torch.Size([1, 768, 31, 7])
130
  ```
131
 
132
+ # Zenodo
133
+
134
+ The checkpoint is also available on Zenodo: https://zenodo.org/record/8020843/files/convnext_tiny_471mAP.pth?download=1
135
+
136
+ Together with a second checkpoint: convnext_tiny_465mAP_BL_AC_70kit.pth
137
+
138
+ The second model is useful to perform audio captioning on the AudioCaps dataset without training data biases. It was trained the same way as the current model, for audio tagging on AudioSet, but the files from AudioCaps were removed from the AudioSet development set.
139
+
140
 
141
+ # Citation
142
 
143
  Cite as: Pellegrini, T., Khalfaoui-Hassani, I., Labbé, E., Masquelier, T. (2023) Adapting a ConvNeXt Model to Audio Classification on AudioSet. Proc. INTERSPEECH 2023, 4169-4173, doi: 10.21437/Interspeech.2023-1564
144