--- language: en datasets: - librispeech metrics: - wer pipeline_tag: automatic-speech-recognition tags: - transcription - audio - speech - chunkformer - asr - automatic-speech-recognition - long-form transcription - librispeech license: cc-by-nc-4.0 model-index: - name: ChunkFormer-Large-En-Libri-960h results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: test-clean type: librispeech args: en metrics: - name: Test WER type: wer value: 2.69 - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: test-other type: librispeech args: en metrics: - name: Test WER type: wer value: 6.91 --- # **ChunkFormer-Large-En-Libri-960h: Pretrained ChunkFormer-Large on 960 hours of LibriSpeech dataset** [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/) [![GitHub](https://img.shields.io/badge/GitHub-ChunkFormer-blue)](https://github.com/khanld/chunkformer) [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://arxiv.org/abs/2502.14673) [![Model size](https://img.shields.io/badge/Params-110M-lightgrey#model-badge)](#description) **!!!ATTENTION: Input audio must be MONO (1 channel) at 16,000 sample rate** --- ## Table of contents 1. [Model Description](#description) 2. [Documentation and Implementation](#implementation) 3. [Benchmark Results](#benchmark) 4. [Usage](#usage) 6. [Citation](#citation) 7. [Contact](#contact) --- ## Model Description **ChunkFormer-Large-En-Libri-960h** is an English Automatic Speech Recognition (ASR) model based on the **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on 960 hours of LibriSpeech, a widely-used dataset for ASR research. --- ## Documentation and Implementation The [Documentation]() and [Implementation](https://github.com/khanld/chunkformer) of ChunkFormer are publicly available. --- ## Benchmark Results We evaluate the models using **Word Error Rate (WER)**. To ensure a fair comparison, all models are trained exclusively with the [**WENET**](https://github.com/wenet-e2e/wenet) framework. | STT | Model | Test-Clean | Test-Other | Avg. | |-----|-----------------------|------------|------------|------ | | 1 | **ChunkFormer** | 2.69 | 6.91 | 4.80 | | 2 | **Efficient Conformer** | 2.71 | 6.95 | 4.83 | | 3 | **Conformer** | 2.77 | 6.93 | 4.85 | | 4 | **Squeezeformer** | 2.87 | 7.16 | 5.02 | --- ## Quick Usage To use the ChunkFormer model for English Automatic Speech Recognition, follow these steps: 1. **Download the ChunkFormer Repository** ```bash git clone https://github.com/khanld/chunkformer.git cd chunkformer pip install -r requirements.txt ``` 2. **Download the Model Checkpoint from Hugging Face** ```bash pip install huggingface_hub huggingface-cli download khanhld/chunkformer-large-en-libri-960h --local-dir "./chunkformer-large-en-libri-960h" ``` or ```bash git lfs install git clone https://huggingface.co/khanhld/chunkformer-large-en-libri-960h ``` This will download the model checkpoint to the checkpoints folder inside your chunkformer directory. 3. **Run the model** ```bash python decode.py \ --model_checkpoint path/to/local/chunkformer-large-en-libri-960h \ --long_form_audio path/to/audio.wav \ --total_batch_duration 14400 \ #in second, default is 1800 --chunk_size 64 \ --left_context_size 128 \ --right_context_size 128 ``` Example Output: ``` [00:00:01.200] - [00:00:02.400]: this is a transcription example [00:00:02.500] - [00:00:03.700]: testing the long-form audio ``` **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage) --- ## Citation If you use this work in your research, please cite: ```bibtex @inproceedings{chunkformer, title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription}, author={Khanh Le, Tuan Vu Ho, Dung Tran and Duc Thanh Chau}, booktitle={ICASSP}, year={2025} } ``` --- ## Contact - khanhld218@gmail.com - [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/khanld) - [![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/khanhld257/)