File size: 1,592 Bytes
3056463 c3a6502 219a0db 9d2581b 1a6f09e 219a0db 3056463 1a6f09e 219a0db 1a6f09e c3a6502 219a0db 9d2581b 219a0db 9d2581b 219a0db 3056463 219a0db 3056463 7842264 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# Khmer Automatic Speech Recognition
## Installation
#### Install from PyPI
```sh
pip install sdab
```
#### Install from source
```sh
# clone repo
git clone https://github.com/MetythornPenn/sdab.git
# install lib from source
pip install -e .
```
## Usage
#### Download sample audio
```bash
wget -O audio.wav https://github.com/MetythornPenn/sdab/blob/main/sample/audio.wav
```
#### Python API
```python
from sdab import Sdab
file_path = "audio.wav"
model_name = "metythorn/khmer-asr-openslr" # or local directory path
sdab = Sdab( file_path = file_path, model_name = model_name)
print(sdab.result)
# result : ααααΆαααααα
αααα’αααααΎααα
ααααΈαααααΆααααΆααααααΆαααααααα»αααααααΆααΆα’αΆα
ααααα»ααΆ
```
- `file_path`: path of audio file
- `model_name` : pretrain model path from `huggingface` or `local`
- `device` : should be `cpu` or `cuda` but I use `cpu` by default
- `tokenized`: show `[PAD]` in output, `False` by default
- `return`: Khmer text from ASR
## Reference
- Inspired by [Techcast](https://www.youtube.com/watch?v=ekhFo-6JzLQ&t=28s)
- Khmer word segmentation from SeangHay [khmercut](https://github.com/seanghay/khmercut.git) | [khmersegment](https://github.com/seanghay/khmersegment)
- Wav2Vec2 from Facebook [Wav2Vec2](https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md)
---
license: apache-2.0
datasets:
- openslr/openslr
language:
- km
tags:
- asr
- khmer asr
- khmer speech to text
- speech to text
--- |