lijialudew
/

wav2vec_children_ASR

Audio Classification

speechbrain

English

Model card Files Files and versions Community

lijialudew commited on 11 days ago

Commit

354afc5

•

1 Parent(s): 738d959

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -10

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ pipeline_tag: audio-classification
 We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.
 - **W2V2-Libri100h**: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
-- **W2V2-MyST**: We then fine-tune W2V2-Libri100h using [My Science Tutor](https://boulderlearning.com/products/myst/) corpus (consists of conversational speech of students between the third and fifth grades with a virtual tutor).
 - **W2V2-Libri100h-Pro (two-level fine-tuning)**: We fine-tune W2V2-Libri100h using [Providence](https://phonbank.talkbank.org/access/Eng-NA/Providence.html) corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
 - **W2V2-MyST-Pro (three-level fine-tuning)**: Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.
@@ -24,7 +24,8 @@ We show W2V2-MyST-Pro is helpful for improving children's vocalization classific
 ## Model Sources
 For more information regarding this model, please checkout our paper:
-- **Paper:** https://arxiv.org/pdf/2309.07287.pdf
 ## Model Description
@@ -37,27 +38,35 @@ Folder contains the best checkpoint of the following setting
 ## Uses
 **We develop our complete fine-tuning recipe using SpeechBrain toolkit available at**
 - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/RABC** (used for Rapid-ABC corpus)
 - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/Babblecor** (used for BabbleCor corpus)
 # Paper/BibTex Citation
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 If you found this model helpful to you, please cite us as
 <pre><code>
 @article{li2023enhancing,
-  title={Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features},
   author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
-  journal={arXiv preprint arXiv:2309.07287},
-  year={2023}
 }
 </code></pre>
 # Model Card Contact
-Jialu Li (she, her, hers)
-Ph.D candidate @ Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
 E-mail: jialuli3@illinois.edu

 We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.
 - **W2V2-Libri100h**: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
+- **W2V2-MyST**: We then fine-tune W2V2-Libri100h using [My Science Tutor](https://catalog.ldc.upenn.edu/LDC2021S05) corpus (consists of conversational speech of students between the third and fifth grades with a virtual tutor).
 - **W2V2-Libri100h-Pro (two-level fine-tuning)**: We fine-tune W2V2-Libri100h using [Providence](https://phonbank.talkbank.org/access/Eng-NA/Providence.html) corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
 - **W2V2-MyST-Pro (three-level fine-tuning)**: Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.
 ## Model Sources
 For more information regarding this model, please checkout our paper:
+- **[Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis](https://arxiv.org/abs/2309.07287)**
+- **[Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations](https://arxiv.org/abs/2402.06888)**
 ## Model Description
 ## Uses
 **We develop our complete fine-tuning recipe using SpeechBrain toolkit available at**
+TO DO
+<!--
 - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/RABC** (used for Rapid-ABC corpus)
 - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/Babblecor** (used for BabbleCor corpus)
+-->
 # Paper/BibTex Citation
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 If you found this model helpful to you, please cite us as
 <pre><code>
 @article{li2023enhancing,
+  title={Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis},
   author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
+  booktitle={Interspeech},
+  year={2024}
+}
+</code></pre>
+or
+<pre><code>
+@inproceedings{li2024analysis,
+  title={Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations},
+  author={Li, Jialu and Hasegawa-Johnson, Mark and McElwain, Nancy L},
+  booktitle={IEEE Workshop on Self-Supervision in Audio, Speech and Beyond (SASB)},
+  year={2024}
 }
 </code></pre>
 # Model Card Contact
+Jialu Li, Ph.D. (she, her, hers)
 E-mail: jialuli3@illinois.edu