topel commited on
Commit
1e97b39
1 Parent(s): 40c6e9b

Update readme

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -14,13 +14,14 @@ extra_gated_fields:
14
  I plan to use this model for (task, type of audio data, etc): text
15
  ---
16
 
17
- **ConvNeXt-Tiny-AT** is an audio tagging CNN model, trained on **AudioSet** (balanced+unbalanced subsets). It reached 0.471 mAP on the test set.
18
 
19
  The model expects as input audio files of duration 10 seconds, and sample rate 32kHz.
20
  It provides logits and probabilities for the 527 audio event tags of AudioSet (see http://research.google.com/audioset/index.html).
21
  Two methods can also be used to get scene embeddings (a single vector per file) and frame-level embeddings, see below.
22
  The scene embedding is obtained from the frame-level embeddings, on which mean pooling is applied onto the frequency dim, followed by mean pooling + max pooling onto the time dim.
23
 
 
24
  # Install
25
 
26
  This code is based on our repo: https://github.com/topel/audioset-convnext-inf
@@ -35,6 +36,10 @@ pip install git+https://github.com/topel/audioset-convnext-inf@pip-install
35
  Below is an example of how to instantiate our model convnext_tiny_471mAP.pth
36
 
37
  ```python
 
 
 
 
38
  import os
39
  import numpy as np
40
  import torch
@@ -146,6 +151,8 @@ The second model is useful to perform audio captioning on the AudioCaps dataset
146
 
147
  # Citation
148
 
 
 
149
  Cite as: Pellegrini, T., Khalfaoui-Hassani, I., Labbé, E., Masquelier, T. (2023) Adapting a ConvNeXt Model to Audio Classification on AudioSet. Proc. INTERSPEECH 2023, 4169-4173, doi: 10.21437/Interspeech.2023-1564
150
 
151
  ```bibtex
 
14
  I plan to use this model for (task, type of audio data, etc): text
15
  ---
16
 
17
+ **ConvNeXt-Tiny-AT** is an audio tagging CNN model, trained on **AudioSet** (balanced+unbalanced subsets). It reached 0.471 mAP on the test set [(Paper)](https://www.isca-speech.org/archive/interspeech_2023/pellegrini23_interspeech.html).
18
 
19
  The model expects as input audio files of duration 10 seconds, and sample rate 32kHz.
20
  It provides logits and probabilities for the 527 audio event tags of AudioSet (see http://research.google.com/audioset/index.html).
21
  Two methods can also be used to get scene embeddings (a single vector per file) and frame-level embeddings, see below.
22
  The scene embedding is obtained from the frame-level embeddings, on which mean pooling is applied onto the frequency dim, followed by mean pooling + max pooling onto the time dim.
23
 
24
+
25
  # Install
26
 
27
  This code is based on our repo: https://github.com/topel/audioset-convnext-inf
 
36
  Below is an example of how to instantiate our model convnext_tiny_471mAP.pth
37
 
38
  ```python
39
+ # 1. visit hf.co/topel/ConvNeXt-Tiny-AT and accept user conditions
40
+ # 2. visit hf.co/settings/tokens to create an access token
41
+ # 3. instantiate pretrained model
42
+
43
  import os
44
  import numpy as np
45
  import torch
 
151
 
152
  # Citation
153
 
154
+ [Paper available](https://www.isca-speech.org/archive/interspeech_2023/pellegrini23_interspeech.html)
155
+
156
  Cite as: Pellegrini, T., Khalfaoui-Hassani, I., Labbé, E., Masquelier, T. (2023) Adapting a ConvNeXt Model to Audio Classification on AudioSet. Proc. INTERSPEECH 2023, 4169-4173, doi: 10.21437/Interspeech.2023-1564
157
 
158
  ```bibtex