kehanlu
/

mandarin-wav2vec2-aishell1

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

kehanlu commited on Oct 30, 2022

Commit

05c9149

•

1 Parent(s): d4d9bc7

Create README.md

Files changed (1) hide show

README.md +84 -0

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+---
+language:
+  - "zh"
+thumbnail: "Mandarin-wav2vec2.0 fine-tuned on AISHELL-1 dataset"
+tags:
+- automatic-speech-recognition
+- speech
+- wav2vec2.0
+- audio
+datasets:
+- AISHELL-1
+metrics:
+- cer
+---
+The Mandarin-wav2vec2.0 model is pre-trained on 1000 hours of AISHELL-2 dataset. The pre-training detail can be found at https://github.com/kehanlu/mandarin-wav2vec2. This model is fine-tuned on 178 hours of AISHELL-1 dataset and is the baseline model in the paper "A context-aware knowledge transferring strategy for CTC-based ASR
+"([preprint](https://arxiv.org/abs/2210.06244)).
+|CER|dev|test|
+| - | - | -  |
+|vanilla w2v2-CTC | 4.85 | 5.13|
+## Usage
+**Note:** the model is fine-tuned using ESPNET toolkit, then converted to huggingface model for simple usage.
+```python
+import torch
+import soundfile as sf
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+class ExtendedWav2Vec2ForCTC(Wav2Vec2ForCTC):
+    """
+    In ESPNET there is a LayerNorm layer between encoder output and CTC classification head.
+    """
+    def __init__(self, config):
+        super().__init__(config)
+        self.lm_head = torch.nn.Sequential(
+                torch.nn.LayerNorm(config.hidden_size),
+                self.lm_head
+        )
+model = ExtendedWav2Vec2ForCTC.from_pretrained("kehanlu/wav2vec2-mandarin-aishell1")
+processor = Wav2Vec2Processor.from_pretrained("kehanlu/wav2vec2-mandarin-aishell1")
+audio_input, sample_rate = sf.read("/path/to/data_aishell/wav/dev/S0724/BAC009S0724W0121.wav")
+inputs = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt")
+with torch.no_grad():
+    model.eval()
+    logits = model(**inputs).logits
+    predicted_ids = torch.argmax(logits, dim=-1)
+    transcription = processor.batch_decode(predicted_ids)
+print(transcription[0])
+# 广州市房地产中介协会分析
+```
+## Licence
+The pre-trained corpus, AISHELL-2, is supported by AISHELL fundation. The outcome model also follow the licence of AISHELL-2. It is free to use for academic purpose and should not be used on any commercial purpose without the permission from AISHELL fundation. (https://www.aishelltech.com/aishell_2)
+```
+@ARTICLE{aishell2,
+   author = {{Du}, J. and {Na}, X. and {Liu}, X. and {Bu}, H.},
+   title = "{AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale}",
+   journal = {ArXiv},
+   eprint = {1808.10583},
+   primaryClass = "cs.CL",
+   year = 2018,
+   month = Aug,
+}
+```
+if you find this useful, please cite
+```
+@article{lu2022context,
+  title={A context-aware knowledge transferring strategy for CTC-based ASR},
+  author={Lu, Ke-Han and Chen, Kuan-Yu},
+  journal={arXiv preprint arXiv:2210.06244},
+  year={2022}
+}
+```