Audio Classification
speechbrain
English
lijialudew commited on
Commit
354afc5
1 Parent(s): 738d959

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -10
README.md CHANGED
@@ -16,7 +16,7 @@ pipeline_tag: audio-classification
16
  We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.
17
 
18
  - **W2V2-Libri100h**: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
19
- - **W2V2-MyST**: We then fine-tune W2V2-Libri100h using [My Science Tutor](https://boulderlearning.com/products/myst/) corpus (consists of conversational speech of students between the third and fifth grades with a virtual tutor).
20
  - **W2V2-Libri100h-Pro (two-level fine-tuning)**: We fine-tune W2V2-Libri100h using [Providence](https://phonbank.talkbank.org/access/Eng-NA/Providence.html) corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
21
  - **W2V2-MyST-Pro (three-level fine-tuning)**: Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.
22
 
@@ -24,7 +24,8 @@ We show W2V2-MyST-Pro is helpful for improving children's vocalization classific
24
 
25
  ## Model Sources
26
  For more information regarding this model, please checkout our paper:
27
- - **Paper:** https://arxiv.org/pdf/2309.07287.pdf
 
28
 
29
  ## Model Description
30
 
@@ -37,27 +38,35 @@ Folder contains the best checkpoint of the following setting
37
 
38
  ## Uses
39
  **We develop our complete fine-tuning recipe using SpeechBrain toolkit available at**
40
-
 
41
  - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/RABC** (used for Rapid-ABC corpus)
42
  - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/Babblecor** (used for BabbleCor corpus)
43
-
44
  # Paper/BibTex Citation
45
 
46
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
47
  If you found this model helpful to you, please cite us as
48
  <pre><code>
49
  @article{li2023enhancing,
50
- title={Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features},
51
  author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
52
- journal={arXiv preprint arXiv:2309.07287},
53
- year={2023}
 
 
 
 
 
 
 
 
 
54
  }
55
  </code></pre>
56
 
57
  # Model Card Contact
58
- Jialu Li (she, her, hers)
59
-
60
- Ph.D candidate @ Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
61
 
62
  E-mail: jialuli3@illinois.edu
63
 
 
16
  We build a CTC-based phoneme recognition model using wav2vec 2.0 (W2V2) for children under 4-year-old. We use three-level fine-tuning to gradually reduce age mismatch between adult phonetics to child phonetics.
17
 
18
  - **W2V2-Libri100h**: We first fine-tune W2V2-Base using 100 hours of LibriSpeech pretrained on unlabeled 960 hours LibriSpeech adult speech corpus with IPA phone sequences.
19
+ - **W2V2-MyST**: We then fine-tune W2V2-Libri100h using [My Science Tutor](https://catalog.ldc.upenn.edu/LDC2021S05) corpus (consists of conversational speech of students between the third and fifth grades with a virtual tutor).
20
  - **W2V2-Libri100h-Pro (two-level fine-tuning)**: We fine-tune W2V2-Libri100h using [Providence](https://phonbank.talkbank.org/access/Eng-NA/Providence.html) corpus (consists of longititude audio of 6 English-speaking children aged from 1-4 years interacting with their mothers at home) on phoneme sequences.
21
  - **W2V2-MyST-Pro (three-level fine-tuning)**: Similar as W2V2-Libri100h-Pro, we fine-tune W2V2-MyST using Providence on phoneme sequences.
22
 
 
24
 
25
  ## Model Sources
26
  For more information regarding this model, please checkout our paper:
27
+ - **[Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis](https://arxiv.org/abs/2309.07287)**
28
+ - **[Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations](https://arxiv.org/abs/2402.06888)**
29
 
30
  ## Model Description
31
 
 
38
 
39
  ## Uses
40
  **We develop our complete fine-tuning recipe using SpeechBrain toolkit available at**
41
+ TO DO
42
+ <!--
43
  - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/RABC** (used for Rapid-ABC corpus)
44
  - **https://github.com/jialuli3/speechbrain/tree/infant-voc-classification/recipes/Babblecor** (used for BabbleCor corpus)
45
+ -->
46
  # Paper/BibTex Citation
47
 
48
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
49
  If you found this model helpful to you, please cite us as
50
  <pre><code>
51
  @article{li2023enhancing,
52
+ title={Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis},
53
  author={Li, Jialu and Hasegawa-Johnson, Mark and Karahalios, Karrie},
54
+ booktitle={Interspeech},
55
+ year={2024}
56
+ }
57
+ </code></pre>
58
+ or
59
+ <pre><code>
60
+ @inproceedings{li2024analysis,
61
+ title={Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations},
62
+ author={Li, Jialu and Hasegawa-Johnson, Mark and McElwain, Nancy L},
63
+ booktitle={IEEE Workshop on Self-Supervision in Audio, Speech and Beyond (SASB)},
64
+ year={2024}
65
  }
66
  </code></pre>
67
 
68
  # Model Card Contact
69
+ Jialu Li, Ph.D. (she, her, hers)
 
 
70
 
71
  E-mail: jialuli3@illinois.edu
72