File size: 4,786 Bytes
2ac2327
128fdf1
 
8d39a71
461d8d9
ebfc7fb
461d8d9
 
 
 
 
 
ebfc7fb
 
461d8d9
ebfc7fb
461d8d9
ebfc7fb
461d8d9
ebfc7fb
461d8d9
2ac2327
 
 
 
5c11cde
 
2ac2327
 
 
 
 
4dd05ff
 
2ac2327
128fdf1
515eb1a
128fdf1
 
2ac2327
128fdf1
2ac2327
 
 
128fdf1
 
 
2ac2327
128fdf1
2ac2327
128fdf1
 
 
 
 
2ac2327
128fdf1
da5719f
128fdf1
 
 
2ac2327
128fdf1
 
2ac2327
128fdf1
 
 
 
 
 
 
 
 
 
 
 
2ac2327
128fdf1
 
 
 
2ac2327
128fdf1
 
2ac2327
128fdf1
 
 
2ac2327
128fdf1
2ac2327
 
 
 
 
128fdf1
2ac2327
128fdf1
 
 
2ac2327
128fdf1
2ac2327
 
 
128fdf1
 
 
 
 
b0ab9db
128fdf1
 
 
b0ab9db
128fdf1
2ac2327
 
 
5c11cde
2ac2327
 
4dd05ff
 
 
 
 
fcd8999
4dd05ff
2ac2327
128fdf1
2ac2327
128fdf1
2ac2327
128fdf1
2ac2327
8f4509b
 
2ac2327
 
 
128fdf1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
language:
- hr
license: cc-by-sa-4.0
library_name: transformers
base_model: openai/whisper-large-v3
datasets:
- classla/Mici_Princ
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
widget:
- example_title: example 1
  src: https://huggingface.co/classla/whisper-large-v3-mici-princ/raw/main/MP_13_65.37-74.67.mp3.wav
- example_title: example 2
  src: https://huggingface.co/classla/whisper-large-v3-mici-princ/raw/main/MP_15_201.53-210.02.mp3.wav
- example_title: example 3
  src: https://huggingface.co/classla/whisper-large-v3-mici-princ/raw/main/MP_15_60.527-67.71.mp3.wav
- example_title: example 4
  src: https://huggingface.co/classla/whisper-large-v3-mici-princ/raw/main/MP_15_68.5-72.45.mp3.wav
---

# Model Card for Model ID

This model was finetuned on the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ), 
the audiobook of the translation of _Le Petit Prince_ into the Chakavian dialect of Croatian.

## Model Details

### Model Description

The model, already very potent in standard Croatian, was finetuned for 80 epochs with an effective batch size of 16. Performance was inspected every 4 epochs, and the latest checkpoint
is uploaded here. Character error rate has been brought down from 11.54% to 3.95%, while word error rate has been lowered from 35.43% to 16.83%.

- **Developed by:** Nikola Ljubešić, Peter Rupnik, Tea Perinčić
- **Language(s) (NLP):** Croatian (hrv) - Chakavian dialect (ckm)
- **License:** Creative Commons - Share Alike 4.0
- **Finetuned from model:** openai/whisper-large-v3

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [GitHub](https://github.com/5roop/mici_princ_whisper)
- **Paper:** Coming soon
- **Dataset:** [Mići Princ](https://huggingface.co/datasets/classla/Mici_Princ)

## Example use:

```python
import torch
from datasets import load_dataset
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from transformers.pipelines.pt_utils import KeyDataset

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_id = "classla/whisper-large-v3-mici-princ"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id,
)

model.to(device)
processor = AutoProcessor.from_pretrained(model_id)

ds = load_dataset("classla/Mici_Princ", split="test")
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    device=device,
)

result = pipe(
    KeyDataset(ds, "audio"),
    generate_kwargs={"language": "croatian"},
)

for i in result:
    print(i)

# Output:
# {'text': ' Šesti planet je biv deset put veći. Na njin je bivav niki stari čovik ki je pisav vele knjige.', 'chunks': [{'timestamp': (0.0, 7.18), 'text': ' Šesti planet je biv deset put veći. Na njin je bivav niki stari čovik ki je pisav vele knjige.'}]}
# ...

```



## Training Details

#### Preprocessing

Model was trained on the `normalized_text` attribute of the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ). This means
that the data included capital letters and punctuation, except bullet points, newlines, and quotation marks. Special characters, present in
the dialect, but not in standard Croatian, were substituted.

Only the `train` split was used in training.

#### Training Hyperparameters

```
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=1e-5,
    warmup_steps=100,
    max_steps=277 * 80,
    gradient_checkpointing=True,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=277,
```

## Evaluation

For evaluation, the `test` split of the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ) was used. The test split consists of two known speakers, Autor and Mići Princ, and two unknown speakers, Geograf and Dilavac. Important to note is that each speaker uses a different micro-dialect, so the test data is challenging on including two new micro-dialects.

#### Metrics
| speaker | WER vanilla | WER fine-tuned | WER reduction | CER vanilla | CER fine-tuned| CER reduction | 
|---|---|---|---|---|---|---|
| all | 35.43% |  16.83% | 52.50% | 11.54% | 3.95% | 65.77% |
| Autor | 38.96% | 14.29% | 63.32% | 10.24% | 2.93% | 71.39% |
| Geograf | 20.94% | 11.57% | 44.75% | 4.99% | 2.19% | 56.11% |
| Mići Princ | 45.32% | 16.62% | 63.33% | 12.21% | 5.09% | 58.31% |
| Dilavac |  39.60% | 23.70% | 40.15% | 18.55% | 5.27% | 71.59% |

## Citation 

Coming soon.

## Model Card Authors

* Peter Rupnik
* Nikola Ljubešić

## Model Card Contact

[https://huggingface.co/5roop](https://huggingface.co/5roop)