Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -12,14 +12,14 @@ task_categories:
|
|
12 |
- audio-captioning
|
13 |
---
|
14 |
|
|
|
|
|
|
|
|
|
15 |
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/-Python 3.10+-blue?style=for-the-badge&logo=python&logoColor=white"></a><a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/-PyTorch 1.10.1+-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white"></a><a href="https://black.readthedocs.io/en/stable/"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-black.svg?style=for-the-badge&labelColor=gray"></a>
|
16 |
<a href="https://github.com/Labbeti/conette-audio-captioning/actions">
|
17 |
<img alt="Build" src="https://img.shields.io/github/actions/workflow/status/Labbeti/conette-audio-captioning/python-package-pip.yaml?branch=main&style=for-the-badge&logo=github">
|
18 |
</a>
|
19 |
-
|
20 |
-
<div align="center">
|
21 |
-
|
22 |
-
# CoNeTTE model source
|
23 |
<!-- <a href='https://aac-metrics.readthedocs.io/en/stable/?badge=stable'>
|
24 |
<img src='https://readthedocs.org/projects/aac-metrics/badge/?version=stable&style=for-the-badge' alt='Documentation Status' />
|
25 |
</a> -->
|
@@ -88,11 +88,14 @@ conette-predict --audio "/your/path/to/audio.wav"
|
|
88 |
|
89 |
| Test data | SPIDEr (%) | SPIDEr-FL (%) | FENSE (%) | Vocab | Outputs | Scores |
|
90 |
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
|
91 |
-
| AC-test | 44.14 | 43.98 | 60.81 | 309 | [
|
92 |
-
| CL-eval | 30.97 | 30.87 | 51.72 | 636 | [
|
93 |
|
94 |
This model checkpoint has been trained for the Clotho dataset, but it can also reach a good performance on AudioCaps with the "audiocaps" task.
|
95 |
|
|
|
|
|
|
|
96 |
## Citation
|
97 |
The preprint version of the paper describing CoNeTTE is available on arxiv: https://arxiv.org/pdf/2309.00454.pdf
|
98 |
|
|
|
12 |
- audio-captioning
|
13 |
---
|
14 |
|
15 |
+
<div align="center">
|
16 |
+
|
17 |
+
# CoNeTTE model source
|
18 |
+
|
19 |
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/-Python 3.10+-blue?style=for-the-badge&logo=python&logoColor=white"></a><a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/-PyTorch 1.10.1+-ee4c2c?style=for-the-badge&logo=pytorch&logoColor=white"></a><a href="https://black.readthedocs.io/en/stable/"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-black.svg?style=for-the-badge&labelColor=gray"></a>
|
20 |
<a href="https://github.com/Labbeti/conette-audio-captioning/actions">
|
21 |
<img alt="Build" src="https://img.shields.io/github/actions/workflow/status/Labbeti/conette-audio-captioning/python-package-pip.yaml?branch=main&style=for-the-badge&logo=github">
|
22 |
</a>
|
|
|
|
|
|
|
|
|
23 |
<!-- <a href='https://aac-metrics.readthedocs.io/en/stable/?badge=stable'>
|
24 |
<img src='https://readthedocs.org/projects/aac-metrics/badge/?version=stable&style=for-the-badge' alt='Documentation Status' />
|
25 |
</a> -->
|
|
|
88 |
|
89 |
| Test data | SPIDEr (%) | SPIDEr-FL (%) | FENSE (%) | Vocab | Outputs | Scores |
|
90 |
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
|
91 |
+
| AC-test | 44.14 | 43.98 | 60.81 | 309 | [Link](https://github.com/Labbeti/conette-audio-captioning/blob/main/results/conette/outputs_audiocaps_test.csv) | [Link](https://github.com/Labbeti/conette-audio-captioning/blob/main/results/conette/scores_audiocaps_test.yaml) |
|
92 |
+
| CL-eval | 30.97 | 30.87 | 51.72 | 636 | [Link](https://github.com/Labbeti/conette-audio-captioning/blob/main/results/conette/outputs_clotho_eval.csv) | [Link](https://github.com/Labbeti/conette-audio-captioning/blob/main/results/conette/scores_clotho_eval.yaml) |
|
93 |
|
94 |
This model checkpoint has been trained for the Clotho dataset, but it can also reach a good performance on AudioCaps with the "audiocaps" task.
|
95 |
|
96 |
+
## Limitations
|
97 |
+
The model has been trained on audio sampled at 32 kHz and lasting from 1 to 30 seconds. It can handle longer audio files, but it might give worse results.
|
98 |
+
|
99 |
## Citation
|
100 |
The preprint version of the paper describing CoNeTTE is available on arxiv: https://arxiv.org/pdf/2309.00454.pdf
|
101 |
|