File size: 1,914 Bytes
5c940ee da82538 5c940ee 8fcc382 5c940ee 024415a 8fcc382 5c940ee fb2d816 f0d7494 fb2d816 f0d7494 fb2d816 f0d7494 024415a 54ae41d 024415a fb2d816 f0d7494 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
language: en
license: mit
tags:
- audio
- captioning
- text
- audio-captioning
- automated-audio-captioning
task_categories:
- audio-captioning
---
# CoNeTTE (ConvNext-Transformer with Task Embedding) for Automated Audio Captioning
<font color='red'>This model is currently in developement, and all the required files are not yet available.</font>
This model generate a short textual description of any audio file.
## Installation
```bash
pip install conette
```
## Usage
```py
from conette import CoNeTTEConfig, CoNeTTEModel
config = CoNeTTEConfig.from_pretrained("Labbeti/conette")
model = CoNeTTEModel.from_pretrained("Labbeti/conette", config=config)
path = "/my/path/to/audio.wav"
outputs = model(path)
cands = outputs["cands"][0]
print(cands)
```
## Single model performance
| Dataset | SPIDEr (%) | SPIDEr-FL (%) | FENSE (%) |
| ------------- | ------------- | ------------- | ------------- |
| AudioCaps | 44.14 | 43.98 | 60.81 |
| Clotho | 30.97 | 30.87 | 51.72 |
## Citation
The preprint version of the paper describing CoNeTTE is available on arxiv: https://arxiv.org/pdf/2309.00454.pdf
```
@misc{labbé2023conette,
title = {CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding},
author = {Étienne Labbé and Thomas Pellegrini and Julien Pinquier},
year = 2023,
journal = {arXiv preprint arXiv:2309.00454},
url = {https://arxiv.org/pdf/2309.00454.pdf},
eprint = {2309.00454},
archiveprefix = {arXiv},
primaryclass = {cs.SD}
}
```
## Additional information
The encoder part of the architecture is based on a ConvNeXt model for audio classification, available here: https://huggingface.co/topel/ConvNeXt-Tiny-AT.
The encoder weights used are named "convnext_tiny_465mAP_BL_AC_70kit.pth", available on Zenodo: https://zenodo.org/record/8020843.
It was created by [@Labbeti](https://hf.co/Labbeti). |