File size: 1,839 Bytes
3c0cc82
 
 
 
 
 
f60eaa1
a5466d4
 
 
e404b97
f60eaa1
d0dded1
 
 
e404b97
 
d0dded1
4db98ee
d0dded1
 
 
 
 
 
 
f60eaa1
 
e404b97
 
 
 
 
 
 
 
 
 
 
 
 
91d1a17
 
 
e404b97
 
91d1a17
 
 
e404b97
 
 
 
6a9787b
e404b97
 
 
 
 
 
6a9787b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
language:
- en
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
Model trained in int8 with LoRA

Usage:

prepare pipeline, providing any custom generate_kwargs supprted by https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/text_generation#transformers.GenerationConfig

```
asr_model=prepare_pipeline(
        model_dir='.', # wherever you save the model
        generate_kwargs={
                'max_new_tokens':112,
                'num_beams':1,
                'repetition_penalty':1,
                'do_sample':False
                            }
                )
```
run ASR:
```
asr_model(audio_path)
```

run ASR on full directory in `audio_dir`:
If generate_kwargs not specified,  will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty

```
ASRdirWhisat(
        audio_dir, 
        out_dir = '../whisat_results/',
        model_dir=".",
)
```


Training information:
- Training script: tune_hf_whisper.py  
- Training hyperparameters: hparams.yaml
- Training data manifest: PUBLIC_KIDS_TRAIN_v4_deduped.csv 

Note: to recreate this training you will need to acquire the following public datasets:
- MyST (myst-v0.4.2) 
- CuKids
- CSLU

and ensure they are stored at paths consistend with those in the data manifest above.

Reference:
```
@inproceedings{southwell2024,
  title={Automatic speech recognition tuned for child speech in the classroom},
  author={ Southwell, Rosy and  Ward , Wayne and Trinh , Viet Anh and Clevenger, Charis and  Clevenger, Clay and  Watts, Emily and Reitman, Jason and  D’Mello, Sidney and Whitehill, Jacob},
booktitle={{IEEE} International Conference on Acoustics, Speech and Signal Processing
                  {ICASSP} 2024, Seoul, South Korea, April 14-19, 2024},  
                  year={2024},
}
```