Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ license: apache-2.0
|
|
4 |
# Positive Transfer Of The Whisper Speech Transformer To Human And Animal Voice Activity Detection
|
5 |
We proposed WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for both human and animal Voice Activity Detection (VAD). For more details, please refer to our paper
|
6 |
|
7 |
-
|
8 |
> [Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection](https://doi.org/10.1101/2023.09.30.560270)
|
9 |
>
|
10 |
> Nianlong Gu, Kanghwi Lee, Maris Basha, Sumit Kumar Ram, Guanghao You, Richard H. R. Hahnloser <br>
|
@@ -57,5 +57,22 @@ spec_viewer.visualize( audio = audio, sr = sr, min_frequency= min_frequency, pre
|
|
57 |
Run it in Google Colab: <a href="https://colab.research.google.com/github/nianlonggu/WhisperSeg/blob/master/docs/WhisperSeg_Voice_Activity_Detection_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
58 |
For more details, please refer to the GitHub repository: https://github.com/nianlonggu/WhisperSeg
|
59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
## Contact
|
61 |
nianlong.gu@uzh.ch
|
|
|
4 |
# Positive Transfer Of The Whisper Speech Transformer To Human And Animal Voice Activity Detection
|
5 |
We proposed WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for both human and animal Voice Activity Detection (VAD). For more details, please refer to our paper
|
6 |
|
7 |
+
|
8 |
> [Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection](https://doi.org/10.1101/2023.09.30.560270)
|
9 |
>
|
10 |
> Nianlong Gu, Kanghwi Lee, Maris Basha, Sumit Kumar Ram, Guanghao You, Richard H. R. Hahnloser <br>
|
|
|
57 |
Run it in Google Colab: <a href="https://colab.research.google.com/github/nianlonggu/WhisperSeg/blob/master/docs/WhisperSeg_Voice_Activity_Detection_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
|
58 |
For more details, please refer to the GitHub repository: https://github.com/nianlonggu/WhisperSeg
|
59 |
|
60 |
+
## Citation
|
61 |
+
When using our code or models for your work, please cite the following paper:
|
62 |
+
```
|
63 |
+
@article {Gu2023.09.30.560270,
|
64 |
+
author = {Nianlong Gu and Kanghwi Lee and Maris Basha and Sumit Kumar Ram and Guanghao You and Richard Hahnloser},
|
65 |
+
title = {Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection},
|
66 |
+
elocation-id = {2023.09.30.560270},
|
67 |
+
year = {2023},
|
68 |
+
doi = {10.1101/2023.09.30.560270},
|
69 |
+
publisher = {Cold Spring Harbor Laboratory},
|
70 |
+
abstract = {This paper introduces WhisperSeg, utilizing the Whisper Transformer pre-trained for Automatic Speech Recognition (ASR) for human and animal Voice Activity Detection (VAD). Contrary to traditional methods that detect human voice or animal vocalizations from a short audio frame and rely on careful threshold selection, WhisperSeg processes entire spectrograms of long audio and generates plain text representations of onset, offset, and type of voice activity. Processing a longer audio context with a larger network greatly improves detection accuracy from few labeled examples. We further demonstrate a positive transfer of detection performance to new animal species, making our approach viable in the data-scarce multi-species setting.Competing Interest StatementThe authors have declared no competing interest.},
|
71 |
+
URL = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270},
|
72 |
+
eprint = {https://www.biorxiv.org/content/early/2023/10/02/2023.09.30.560270.full.pdf},
|
73 |
+
journal = {bioRxiv}
|
74 |
+
}
|
75 |
+
```
|
76 |
+
|
77 |
## Contact
|
78 |
nianlong.gu@uzh.ch
|