khanhld3 commited on
Commit
7e37373
·
1 Parent(s): c300245

[test] init

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -65,13 +65,64 @@ model-index:
65
  value: x
66
  ---
67
 
68
- # **ChunkFormer: Masked Chunking Conformer for Long-Form Speech Transcription**
69
  [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
70
  [![Hugging Face](https://img.shields.io/badge/HuggingFace-ChunkFormer-orange)](https://huggingface.co/your-username/chunkformer)
71
  [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://your-paper-link)
72
 
73
- ## **Introduction**
 
 
 
 
 
 
 
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
- ## **Installation**
77
 
 
65
  value: x
66
  ---
67
 
68
+ # **ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition**
69
  [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
70
  [![Hugging Face](https://img.shields.io/badge/HuggingFace-ChunkFormer-orange)](https://huggingface.co/your-username/chunkformer)
71
  [![Paper](https://img.shields.io/badge/Paper-ICASSP%202025-green)](https://your-paper-link)
72
 
73
+ <!-- ### Table of contents
74
+ 1. [Model Description](#description)
75
+ 2. [Implementation](#implementation)
76
+ 3. [Benchmark Result](#benchmark)
77
+ 4. [Example Usage](#example)
78
+ 5. [Evaluation](#evaluation)
79
+ 6. [Citation](#citation)
80
+ 7. [Contact](#contact) -->
81
 
82
+ <a name = "description" ></a>
83
+ ChunkFormer-Large-Vie is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative ChunkFormer architecture, introduced at ICASSP 2025. The model has been fine-tuned on approximately 2000 hours of Vietnamese speech data sourced from diverse datasets.
84
+ <a name = "implementation" ></a>
85
+ ### Documentation and Implementation
86
+ We provide the documentation and implementation of ChunkFormer, check it out [HERE]().
87
+
88
+ <a name = "benchmark" ></a>
89
+ ### Benchmark WER Result
90
+ | STT | Model | Vios | Common Voice | VLSP - Task 1 | Avg. |
91
+ |-----|--------------|------|--------------|---------------|------|
92
+ | 1 | ChunkFormer | x | x | x | x |
93
+ | 2 | PhoWhisper | x | x | x | x |
94
+ | 3 | X | x | x | x | x |
95
+ | 4 | Y | x | x | x | x |
96
+
97
+ <a name = "usage" ></a>
98
+ ### Usage
99
+
100
+ To use the ChunkFormer model for Vietnamese Automatic Speech Recognition, follow these steps:
101
+
102
+ 1. **Download the ChunkFormer Repository**
103
+ Clone the ChunkFormer repository to your local machine:
104
+ ```bash
105
+ git clone https://github.com/khanld/chunkformer.git
106
+ cd chunkformer
107
+ pip install -r requirements.txt
108
+ ```
109
+ 2. **Download the Model Checkpoint from Hugging Face**
110
+ Download the model checkpoint from Hugging Face using the following git lfs command:
111
+ ```bash
112
+ git lfs install
113
+ git clone https://huggingface.co/khanhld/chunkformer-large-vietnamese
114
+ ```
115
+ This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
116
+
117
+ 3. **Run the model**
118
+ Use the following command to transcribe long audio files:
119
+ ```bash
120
+ python decode.py \
121
+ --model_checkpoint path/to/chunkformer-large-vietnamese \
122
+ --long_form_audio path/to/long_audio.wav \
123
+ --chunk_size 64 \
124
+ --left_context_size 128 \
125
+ --right_context_size 128
126
+ ```
127
 
 
128