--- language: - "zh" thumbnail: "Mandarin-wav2vec2.0 fine-tuned on AISHELL-1 dataset" tags: - automatic-speech-recognition - speech - wav2vec2.0 - audio datasets: - AISHELL-1 metrics: - cer --- The Mandarin-wav2vec2.0 model is pre-trained on 1000 hours of AISHELL-2 dataset. The pre-training detail can be found at https://github.com/kehanlu/mandarin-wav2vec2. This model is fine-tuned on 178 hours of AISHELL-1 dataset and is the baseline model in the paper "A context-aware knowledge transferring strategy for CTC-based ASR "([preprint](https://arxiv.org/abs/2210.06244)). ## Results on AISHELL-1 |CER|dev|test| | - | - | - | |vanilla w2v2-CTC | 4.85 | 5.13| ## Usage **Note:** the model is fine-tuned using ESPNET toolkit, then converted to huggingface model for simple usage. ```python import torch import soundfile as sf from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor class ExtendedWav2Vec2ForCTC(Wav2Vec2ForCTC): """ In ESPNET there is a LayerNorm layer between encoder output and CTC classification head. """ def __init__(self, config): super().__init__(config) self.lm_head = torch.nn.Sequential( torch.nn.LayerNorm(config.hidden_size), self.lm_head ) model = ExtendedWav2Vec2ForCTC.from_pretrained("kehanlu/mandarin-wav2vec2-aishell1") processor = Wav2Vec2Processor.from_pretrained("kehanlu/mandarin-wav2vec2-aishell1") audio_input, sample_rate = sf.read("/path/to/data_aishell/wav/dev/S0724/BAC009S0724W0121.wav") inputs = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt") with torch.no_grad(): model.eval() logits = model(**inputs).logits predicted_ids = torch.argmax(logits, dim=-1) transcription = processor.batch_decode(predicted_ids) print(transcription[0]) # 广州市房地产中介协会分析 ``` ## Licence The pre-trained corpus, AISHELL-2, is supported by AISHELL fundation. The outcome model also follow the licence of AISHELL-2. It is free to use for academic purpose and should not be used on any commercial purpose without the permission from AISHELL fundation. (https://www.aishelltech.com/aishell_2) ``` @ARTICLE{aishell2, author = {{Du}, J. and {Na}, X. and {Liu}, X. and {Bu}, H.}, title = "{AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale}", journal = {ArXiv}, eprint = {1808.10583}, primaryClass = "cs.CL", year = 2018, month = Aug, } ``` if you find this useful, please cite ``` @article{lu2022context, title={A context-aware knowledge transferring strategy for CTC-based ASR}, author={Lu, Ke-Han and Chen, Kuan-Yu}, journal={arXiv preprint arXiv:2210.06244}, year={2022} } ```