khanhld3
commited on
Commit
·
7e37373
1
Parent(s):
c300245
[test] init
Browse files
README.md
CHANGED
@@ -65,13 +65,64 @@ model-index:
|
|
65 |
value: x
|
66 |
---
|
67 |
|
68 |
-
# **ChunkFormer:
|
69 |
[data:image/s3,"s3://crabby-images/4a585/4a585bb65226edb1a9f303a86b44de1d385f4b03" alt="License: CC BY-NC 4.0"](https://creativecommons.org/licenses/by-nc/4.0/)
|
70 |
[data:image/s3,"s3://crabby-images/919f6/919f64a65cbde9ad7997f46ad3609321da9ed180" alt="Hugging Face"](https://huggingface.co/your-username/chunkformer)
|
71 |
[data:image/s3,"s3://crabby-images/03061/030614ac3798ba6951a35d8585c91bb9296b2daa" alt="Paper"](https://your-paper-link)
|
72 |
|
73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
-
## **Installation**
|
77 |
|
|
|
65 |
value: x
|
66 |
---
|
67 |
|
68 |
+
# **ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition**
|
69 |
[data:image/s3,"s3://crabby-images/4a585/4a585bb65226edb1a9f303a86b44de1d385f4b03" alt="License: CC BY-NC 4.0"](https://creativecommons.org/licenses/by-nc/4.0/)
|
70 |
[data:image/s3,"s3://crabby-images/919f6/919f64a65cbde9ad7997f46ad3609321da9ed180" alt="Hugging Face"](https://huggingface.co/your-username/chunkformer)
|
71 |
[data:image/s3,"s3://crabby-images/03061/030614ac3798ba6951a35d8585c91bb9296b2daa" alt="Paper"](https://your-paper-link)
|
72 |
|
73 |
+
<!-- ### Table of contents
|
74 |
+
1. [Model Description](#description)
|
75 |
+
2. [Implementation](#implementation)
|
76 |
+
3. [Benchmark Result](#benchmark)
|
77 |
+
4. [Example Usage](#example)
|
78 |
+
5. [Evaluation](#evaluation)
|
79 |
+
6. [Citation](#citation)
|
80 |
+
7. [Contact](#contact) -->
|
81 |
|
82 |
+
<a name = "description" ></a>
|
83 |
+
ChunkFormer-Large-Vie is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative ChunkFormer architecture, introduced at ICASSP 2025. The model has been fine-tuned on approximately 2000 hours of Vietnamese speech data sourced from diverse datasets.
|
84 |
+
<a name = "implementation" ></a>
|
85 |
+
### Documentation and Implementation
|
86 |
+
We provide the documentation and implementation of ChunkFormer, check it out [HERE]().
|
87 |
+
|
88 |
+
<a name = "benchmark" ></a>
|
89 |
+
### Benchmark WER Result
|
90 |
+
| STT | Model | Vios | Common Voice | VLSP - Task 1 | Avg. |
|
91 |
+
|-----|--------------|------|--------------|---------------|------|
|
92 |
+
| 1 | ChunkFormer | x | x | x | x |
|
93 |
+
| 2 | PhoWhisper | x | x | x | x |
|
94 |
+
| 3 | X | x | x | x | x |
|
95 |
+
| 4 | Y | x | x | x | x |
|
96 |
+
|
97 |
+
<a name = "usage" ></a>
|
98 |
+
### Usage
|
99 |
+
|
100 |
+
To use the ChunkFormer model for Vietnamese Automatic Speech Recognition, follow these steps:
|
101 |
+
|
102 |
+
1. **Download the ChunkFormer Repository**
|
103 |
+
Clone the ChunkFormer repository to your local machine:
|
104 |
+
```bash
|
105 |
+
git clone https://github.com/khanld/chunkformer.git
|
106 |
+
cd chunkformer
|
107 |
+
pip install -r requirements.txt
|
108 |
+
```
|
109 |
+
2. **Download the Model Checkpoint from Hugging Face**
|
110 |
+
Download the model checkpoint from Hugging Face using the following git lfs command:
|
111 |
+
```bash
|
112 |
+
git lfs install
|
113 |
+
git clone https://huggingface.co/khanhld/chunkformer-large-vietnamese
|
114 |
+
```
|
115 |
+
This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
|
116 |
+
|
117 |
+
3. **Run the model**
|
118 |
+
Use the following command to transcribe long audio files:
|
119 |
+
```bash
|
120 |
+
python decode.py \
|
121 |
+
--model_checkpoint path/to/chunkformer-large-vietnamese \
|
122 |
+
--long_form_audio path/to/long_audio.wav \
|
123 |
+
--chunk_size 64 \
|
124 |
+
--left_context_size 128 \
|
125 |
+
--right_context_size 128
|
126 |
+
```
|
127 |
|
|
|
128 |
|