cstr
/

Automatic Speech Recognition
Transformers
German
Eval Results
Inference Endpoints
cstr commited on
Commit
037b1f2
1 Parent(s): f9794a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -3
README.md CHANGED
@@ -1,3 +1,127 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - de
5
+ library_name: transformers
6
+ pipeline_tag: automatic-speech-recognition
7
+ model-index:
8
+ - name: whisper-large-v3-turbo-german by Florian Zimmermeister @primeLine
9
+ results:
10
+ - task:
11
+ type: automatic-speech-recognition
12
+ name: Speech Recognition
13
+ dataset:
14
+ name: German ASR Data-Mix
15
+ type: flozi00/asr-german-mixed
16
+ metrics:
17
+ - type: wer
18
+ value: 4.77 %
19
+ name: Test WER
20
+ datasets:
21
+ - flozi00/asr-german-mixed
22
+ - flozi00/asr-german-mixed-evals
23
+ base_model:
24
+ - primeline/whisper-large-v3-german
25
+ ---
26
+ ## Quant
27
+
28
+ This is only a ggml from [primeline/whisper-large-v3-turbo-german](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
29
+ made with https://github.com/ggerganov/whisper.cpp/blob/master/models/convert-h5-to-ggml.py
30
+ (minimally changed).
31
+
32
+ ## Modelcard from primeline/whisper-large-v3-german
33
+
34
+
35
+ ### Summary
36
+ This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.
37
+
38
+
39
+
40
+ ### Applications
41
+ This model can be used in various application areas, including
42
+
43
+ - Transcription of spoken German language
44
+ - Voice commands and voice control
45
+ - Automatic subtitling for German videos
46
+ - Voice-based search queries in German
47
+ - Dictation functions in word processing programs
48
+
49
+
50
+ ## Model family
51
+
52
+ | Model | Parameters | link |
53
+ |----------------------------------|------------|--------------------------------------------------------------|
54
+ | Whisper large v3 german | 1.54B | [link](https://huggingface.co/primeline/whisper-large-v3-german) |
55
+ | Whisper large v3 turbo german | 809M | [link](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
56
+ | Distil-whisper large v3 german | 756M | [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) |
57
+ | tiny whisper | 37.8M | [link](https://huggingface.co/primeline/whisper-tiny-german) |
58
+
59
+
60
+ ## Evaluations
61
+
62
+ | Dataset | openai-whisper-large-v3-turbo | openai-whisper-large-v3 | primeline-whisper-large-v3-german | nyrahealth-CrisperWhisper | primeline-whisper-large-v3-turbo-german |
63
+ |---------------------------------|-------------------------------|-------------------------|----------------------------------|---------------------------|----------------------------------------|
64
+ | common_voice_19_0 | 6.31 | 5.84 | 4.30 | **4.14** | 4.28 |
65
+ | Tuda-De | 11.45 | 11.21 | 9.89 | 13.88 | **8.10** |
66
+ | multilingual librispeech | 18.03 | 17.69 | 13.46 | 10.10 | **4.71** |
67
+ | All | 14.16 | 13.79 | 10.51 | 8.48 | **4.75** |
68
+
69
+
70
+ ### Training data
71
+ The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.
72
+
73
+
74
+ ### Training process
75
+ The training of the model was performed with the following hyperparameters
76
+
77
+ - Batch size: 12288
78
+ - Epochs: 3
79
+ - Learning rate: 1e-6
80
+ - Data augmentation: No
81
+ - Optimizer: [Ademamix](https://arxiv.org/abs/2409.03137)
82
+
83
+
84
+ ### How to use
85
+
86
+ ```python
87
+ import torch
88
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
89
+ from datasets import load_dataset
90
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
91
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
92
+ model_id = "primeline/whisper-large-v3-turbo-german"
93
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
94
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
95
+ )
96
+ model.to(device)
97
+ processor = AutoProcessor.from_pretrained(model_id)
98
+ pipe = pipeline(
99
+ "automatic-speech-recognition",
100
+ model=model,
101
+ tokenizer=processor.tokenizer,
102
+ feature_extractor=processor.feature_extractor,
103
+ max_new_tokens=128,
104
+ chunk_length_s=30,
105
+ batch_size=16,
106
+ return_timestamps=True,
107
+ torch_dtype=torch_dtype,
108
+ device=device,
109
+ )
110
+ dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
111
+ sample = dataset[0]["audio"]
112
+ result = pipe(sample)
113
+ print(result["text"])
114
+ ```
115
+
116
+
117
+ ## [About us](https://primeline-ai.com/en/)
118
+
119
+ [![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)
120
+
121
+
122
+ Your partner for AI infrastructure in Germany <br>
123
+ Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.
124
+
125
+
126
+
127
+ Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)