File size: 2,536 Bytes
4d5e5ab
 
a67bcfc
4d5e5ab
 
 
 
 
 
 
1ce58c3
 
 
4d5e5ab
 
 
 
 
c6757e5
4d5e5ab
a67bcfc
4d5e5ab
a67bcfc
 
4d5e5ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a67bcfc
 
 
 
4d5e5ab
 
6a7f393
 
241e89a
 
 
 
 
 
 
 
6a7f393
 
241e89a
 
 
 
 
 
 
 
 
 
 
 
 
 
6a7f393
4d5e5ab
 
 
 
 
1ce58c3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
base_model: imvladikon/whisper-medium-he
tags:
- generated_from_trainer
metrics:
- wer
model-index:
- name: whisper-medium-he
  results: []
language:
- he
pipeline_tag: automatic-speech-recognition
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# whisper-medium-he[WIP]

This model is a fine-tuned version of [imvladikon/whisper-medium-he](https://huggingface.co/imvladikon/whisper-medium-he) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2061
- Wer: 13.4020

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 4000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0983        | 0.1   | 1000 | 0.3072          | 16.4362 |
| 0.1219        | 0.2   | 2000 | 0.2923          | 15.6642 |
| 0.134         | 0.3   | 3000 | 0.2345          | 13.7450 |
| 0.2113        | 0.39  | 4000 | 0.2061          | 13.4020 |


### Inference

#### HF

```python
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="imvladikon/whisper-medium-he", device_map="auto") # requires `pip install accelerate`
print(recognize("sample.mp3"))
```

#### whisper.cpp

Prepared : https://huggingface.co/imvladikon/whisper-medium-he/blob/main/ggml-hebrew.bin

But if need to convert:
```bash
git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp
git clone https://huggingface.co/imvladikon/whisper-medium-he
python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium-he/ ./whisper .
```

Then possible to check (if produced model is `ggml-model.bin`):
```bash
cd whisper.cpp && ./main -m ../ggml-model.bin -f ../sample.wav
```

### Framework versions

- Transformers 4.36.0.dev0
- Pytorch 2.1.0+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0