khanhld
/

chunkformer-large-vie

@@ -65,13 +65,64 @@ model-index:
          value: x
 ---
-# **ChunkFormer: Masked Chunking Conformer for Long-Form Speech Transcription**
 [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
 [![Hugging Face](https://img.shields.io/badge/HuggingFace-ChunkFormer-orange)](https://huggingface.co/your-username/chunkformer)
 [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://your-paper-link)
-## **Introduction**
-## **Installation**

          value: x
 ---
+# **ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition**
 [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
 [![Hugging Face](https://img.shields.io/badge/HuggingFace-ChunkFormer-orange)](https://huggingface.co/your-username/chunkformer)
 [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://your-paper-link)
+<!-- ### Table of contents
+1. [Model Description](#description)
+2. [Implementation](#implementation)
+3. [Benchmark Result](#benchmark)
+4. [Example Usage](#example)
+5. [Evaluation](#evaluation)
+6. [Citation](#citation)
+7. [Contact](#contact) -->
+<a name = "description" ></a>
+ChunkFormer-Large-Vie is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative ChunkFormer architecture, introduced at ICASSP 2025. The model has been fine-tuned on approximately 2000 hours of Vietnamese speech data sourced from diverse datasets.
+<a name = "implementation" ></a>
+### Documentation and Implementation
+We provide the documentation and implementation of ChunkFormer, check it out [HERE]().
+<a name = "benchmark" ></a>
+### Benchmark WER Result
+| STT | Model        | Vios | Common Voice | VLSP - Task 1 | Avg. |
+|-----|--------------|------|--------------|---------------|------|
+| 1   | ChunkFormer  | x    | x            | x             | x    |
+| 2   | PhoWhisper   | x    | x            | x             | x    |
+| 3   | X            | x    | x            | x             | x    |
+| 4   | Y            | x    | x            | x             | x    |
+<a name = "usage" ></a>
+### Usage
+To use the ChunkFormer model for Vietnamese Automatic Speech Recognition, follow these steps:
+1. **Download the ChunkFormer Repository**
+Clone the ChunkFormer repository to your local machine:
+```bash
+git clone https://github.com/khanld/chunkformer.git
+cd chunkformer
+pip install -r requirements.txt
+```
+2. **Download the Model Checkpoint from Hugging Face**
+Download the model checkpoint from Hugging Face using the following git lfs command:
+```bash
+git lfs install
+git clone https://huggingface.co/khanhld/chunkformer-large-vietnamese
+```
+This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
+3. **Run the model**
+Use the following command to transcribe long audio files:
+```bash
+python decode.py \
+    --model_checkpoint path/to/chunkformer-large-vietnamese \
+    --long_form_audio path/to/long_audio.wav \
+    --chunk_size 64 \
+    --left_context_size 128 \
+    --right_context_size 128
+```