File size: 1,592 Bytes
3056463
 
c3a6502
219a0db
9d2581b
1a6f09e
219a0db
 
 
3056463
1a6f09e
219a0db
 
 
 
 
 
 
 
 
 
 
 
 
 
1a6f09e
c3a6502
219a0db
 
 
 
 
 
 
 
 
 
9d2581b
219a0db
 
9d2581b
219a0db
3056463
 
219a0db
 
 
 
 
 
 
 
 
 
 
 
3056463
7842264
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Khmer Automatic Speech Recognition


## Installation


#### Install from PyPI
```sh
pip install sdab
```

#### Install from source

```sh

# clone repo 
git clone https://github.com/MetythornPenn/sdab.git

# install lib from source
pip install -e .
```

## Usage

#### Download sample audio

```bash
wget -O audio.wav https://github.com/MetythornPenn/sdab/blob/main/sample/audio.wav
```

#### Python API

```python
from sdab import Sdab

file_path = "audio.wav"
model_name = "metythorn/khmer-asr-openslr"  # or local directory path

sdab = Sdab( file_path = file_path, model_name = model_name)
print(sdab.result)

# result : αžŸαŸ’αž–αžΆαž“αž€αŸ†αž–αž„αž…αŸ†αž›αž„αž’αŸ’αž“αž€αž›αžΎαž„αž“αŸ…αž–αŸ’αžšαžΈαžœαŸ‚αž‰αž‡αžΆαžŸαŸ’αž–αžΆαž“αžœαŸαž‰αž‡αžΆαž„αž‚αŸαžŸαž€αŸ’αž“αž»αž„αž–αŸ’αžšαžŸαžšαžΆαž‡αžΆαž’αžΆαž…αž€αž˜αŸ’αž–αž»αž‡αžΆ
```

- `file_path`: path of audio file
- `model_name` : pretrain model path from `huggingface` or `local`
- `device` : should be `cpu` or `cuda` but I use `cpu` by default
- `tokenized`: show `[PAD]` in output, `False` by default
- `return`: Khmer text from ASR


## Reference 
- Inspired by [Techcast](https://www.youtube.com/watch?v=ekhFo-6JzLQ&t=28s)
- Khmer word segmentation from SeangHay [khmercut](https://github.com/seanghay/khmercut.git) | [khmersegment](https://github.com/seanghay/khmersegment)
- Wav2Vec2 from Facebook [Wav2Vec2](https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/README.md)


---
license: apache-2.0
datasets:
- openslr/openslr
language:
- km
tags:
- asr
- khmer asr
- khmer speech to text
- speech to text
---