File size: 4,559 Bytes
0ecb314 4716077 0ecb314 4716077 0ecb314 4716077 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
language: en
datasets:
- librispeech
metrics:
- wer
pipeline_tag: automatic-speech-recognition
tags:
- transcription
- audio
- speech
- chunkformer
- asr
- automatic-speech-recognition
- long-form transcription
- librispeech
license: cc-by-nc-4.0
model-index:
- name: ChunkFormer-Large-En-Libri-960h
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: test-clean
type: librispeech
args: en
metrics:
- name: Test WER
type: wer
value: 2.69
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: test-other
type: librispeech
args: en
metrics:
- name: Test WER
type: wer
value: 6.89
---
# **ChunkFormer-Large-En-Libri-960h: Pretrained ChunkFormer-Large on 960 hours of LibriSpeech dataset**
[](https://creativecommons.org/licenses/by-nc/4.0/)
[](https://github.com/khanld/chunkformer)
[](paper.pdf)
---
## Table of contents
1. [Model Description](#description)
2. [Documentation and Implementation](#implementation)
3. [Benchmark Results](#benchmark)
4. [Usage](#usage)
6. [Citation](#citation)
7. [Contact](#contact)
---
<a name = "description" ></a>
## Model Description
**ChunkFormer-Large-En-Libri-960h** is an English Automatic Speech Recognition (ASR) model based on the **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on 960 hours of LibriSpeech, a widely-used dataset for ASR research.
---
<a name = "implementation" ></a>
## Documentation and Implementation
The [Documentation]() and [Implementation](https://github.com/khanld/chunkformer) of ChunkFormer are publicly available.
---
<a name = "benchmark" ></a>
## Benchmark Results
We evaluate the models using **Word Error Rate (WER)**. To ensure a fair comparison, all models are trained exclusively with the [**WENET**](https://github.com/wenet-e2e/wenet) framework.
| STT | Model | Test-Clean | Test-Other | Avg. |
|-----|-----------------------|------------|------------|------ |
| 1 | **ChunkFormer** | 2.69 | 6.89 | 4.79 |
| 2 | **Efficient Conformer** | 2.71 | 6.95 | 4.83 |
| 3 | **Conformer** | 2.77 | 6.93 | 4.85 |
| 4 | **Squeezeformer** | 2.87 | 7.16 | 5.02 |
---
<a name = "usage" ></a>
## Quick Usage
To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps:
1. **Download the ChunkFormer Repository**
```bash
git clone https://github.com/khanld/chunkformer.git
cd chunkformer
pip install -r requirements.txt
```
2. **Download the Model Checkpoint from Hugging Face**
```bash
pip install huggingface_hub
huggingface-cli download khanhld/chunkformer-large-en-libri-960h --local-dir "./chunkformer-large-en-libri-960h"
```
or
```bash
git lfs install
git clone https://huggingface.co/khanhld/chunkformer-large-en-libri-960h
```
This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
3. **Run the model**
```bash
python decode.py \
--model_checkpoint path/to/local/chunkformer-large-en-libri-960h \
--long_form_audio path/to/audio.wav \
--max_duration 14400 \ #in second, default is 1800
--chunk_size 64 \
--left_context_size 128 \
--right_context_size 128
```
**Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
---
<a name = "citation" ></a>
## Citation
If you use this work in your research, please cite:
```bibtex
@inproceedings{chunkformer,
title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription},
author={Khanh Le, Tuan Vu Ho, Dung Tran and Duc Thanh Chau},
booktitle={ICASSP},
year={2025}
}
```
---
<a name = "contact"></a>
## Contact
- khanhld218@gmail.com
- [](https://github.com/khanld)
- [](https://www.linkedin.com/in/khanhld257/)
|