khanhld
/

chunkformer-ctc-large-vie

@@ -133,39 +133,69 @@ We evaluate the models using **Word Error Rate (WER)**. To ensure consistency an
 ## Quick Usage
 To use the ChunkFormer model for Vietnamese Automatic Speech Recognition, follow these steps:
-1. **Download the ChunkFormer Repository**
 ```bash
-git clone https://github.com/khanld/chunkformer.git
-cd chunkformer
-pip install -r requirements.txt
 ```
-2. **Download the Model Checkpoint from Hugging Face**
 ```bash
-pip install huggingface_hub
-huggingface-cli download khanhld/chunkformer-large-vie --local-dir "./chunkformer-large-vie"
 ```
-or
-```bash
-git lfs install
-git clone https://huggingface.co/khanhld/chunkformer-large-vie
 ```
-This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
-3. **Run the model**
 ```bash
-python decode.py \
-    --model_checkpoint path/to/local/chunkformer-large-vie \
     --long_form_audio path/to/audio.wav \
-    --total_batch_duration 14400 \ #in second, default is 1800
     --chunk_size 64 \
     --left_context_size 128 \
     --right_context_size 128
 ```
 Example Output:
 ```
 [00:00:01.200] - [00:00:02.400]: this is a transcription example
 [00:00:02.500] - [00:00:03.700]: testing the long-form audio
 ```
 **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
 ---

 ## Quick Usage
 To use the ChunkFormer model for Vietnamese Automatic Speech Recognition, follow these steps:
+### Option 1: Install from PyPI (Recommended)
 ```bash
+pip install chunkformer
 ```
+### Option 2: Install from source
 ```bash
+git clone https://github.com/khanld/chunkformer.git
+cd chunkformer
+pip install -e .
 ```
+### Python API Usage
+```python
+from chunkformer import ChunkFormerModel
+# Load the Vietnamese model from Hugging Face
+model = ChunkFormerModel.from_pretrained("khanhld/chunkformer-large-vie")
+# For single long-form audio transcription
+transcription = model.endless_decode(
+    audio_path="path/to/long_audio.wav",
+    chunk_size=64,
+    left_context_size=128,
+    right_context_size=128,
+    total_batch_duration=14400,  # in seconds
+    return_timestamps=True
+)
+print(transcription)
+# For batch processing of multiple audio files
+audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
+transcriptions = model.batch_decode(
+    audio_paths=audio_files,
+    chunk_size=64,
+    left_context_size=128,
+    right_context_size=128,
+    total_batch_duration=1800  # Total batch duration in seconds
+)
+for i, transcription in enumerate(transcriptions):
+    print(f"Audio {i+1}: {transcription}")
 ```
+### Command Line Usage
+After installation, you can use the command line interface:
 ```bash
+chunkformer-decode \
+    --model_checkpoint khanhld/chunkformer-large-vie \
     --long_form_audio path/to/audio.wav \
+    --total_batch_duration 14400 \
     --chunk_size 64 \
     --left_context_size 128 \
     --right_context_size 128
 ```
 Example Output:
 ```
 [00:00:01.200] - [00:00:02.400]: this is a transcription example
 [00:00:02.500] - [00:00:03.700]: testing the long-form audio
 ```
 **Advanced Usage** can be found [HERE](https://github.com/khanld/chunkformer/tree/main?tab=readme-ov-file#usage)
 ---