cstr
/

Automatic Speech Recognition
Transformers
German
Eval Results
Inference Endpoints
cstr commited on
Commit
d2a7e88
1 Parent(s): 573db59

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - de
5
+ library_name: transformers
6
+ pipeline_tag: automatic-speech-recognition
7
+ model-index:
8
+ - name: whisper-large-v3-turbo-german by Florian Zimmermeister @primeLine
9
+ results:
10
+ - task:
11
+ type: automatic-speech-recognition
12
+ name: Speech Recognition
13
+ dataset:
14
+ name: German ASR Data-Mix
15
+ type: flozi00/asr-german-mixed
16
+ metrics:
17
+ - type: wer
18
+ value: 4.77 %
19
+ name: Test WER
20
+ datasets:
21
+ - flozi00/asr-german-mixed
22
+ - flozi00/asr-german-mixed-evals
23
+ base_model:
24
+ - primeline/whisper-large-v3-german
25
+ ---
26
+ ## Quant
27
+
28
+ This is only a int8 quantization from primeline/whisper-large-v3-german per ctranslate2-converter, for usage e.g. in ctranslate2, faster-whisper, etc.
29
+
30
+ ## Modelcard from primeline/whisper-large-v3-german
31
+
32
+
33
+ ### Summary
34
+ This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.
35
+
36
+
37
+
38
+ ### Applications
39
+ This model can be used in various application areas, including
40
+
41
+ - Transcription of spoken German language
42
+ - Voice commands and voice control
43
+ - Automatic subtitling for German videos
44
+ - Voice-based search queries in German
45
+ - Dictation functions in word processing programs
46
+
47
+
48
+ ## Model family
49
+
50
+ | Model | Parameters | link |
51
+ |----------------------------------|------------|--------------------------------------------------------------|
52
+ | Whisper large v3 german | 1.54B | [link](https://huggingface.co/primeline/whisper-large-v3-german) |
53
+ | Whisper large v3 turbo german | 809M | [link](https://huggingface.co/primeline/whisper-large-v3-turbo-german)
54
+ | Distil-whisper large v3 german | 756M | [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) |
55
+ | tiny whisper | 37.8M | [link](https://huggingface.co/primeline/whisper-tiny-german) |
56
+
57
+
58
+ ## Evaluations
59
+
60
+ | Dataset | openai-whisper-large-v3-turbo | openai-whisper-large-v3 | primeline-whisper-large-v3-german | nyrahealth-CrisperWhisper | primeline-whisper-large-v3-turbo-german |
61
+ |---------------------------------|-------------------------------|-------------------------|----------------------------------|---------------------------|----------------------------------------|
62
+ | common_voice_19_0 | 6.31 | 5.84 | 4.30 | **4.14** | 4.28 |
63
+ | Tuda-De | 11.45 | 11.21 | 9.89 | 13.88 | **8.10** |
64
+ | multilingual librispeech | 18.03 | 17.69 | 13.46 | 10.10 | **4.71** |
65
+ | All | 14.16 | 13.79 | 10.51 | 8.48 | **4.75** |
66
+
67
+
68
+ ### Training data
69
+ The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.
70
+
71
+
72
+ ### Training process
73
+ The training of the model was performed with the following hyperparameters
74
+
75
+ - Batch size: 12288
76
+ - Epochs: 3
77
+ - Learning rate: 1e-6
78
+ - Data augmentation: No
79
+ - Optimizer: [Ademamix](https://arxiv.org/abs/2409.03137)
80
+
81
+
82
+ ### How to use
83
+
84
+ ```python
85
+ import torch
86
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
87
+ from datasets import load_dataset
88
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
89
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
90
+ model_id = "primeline/whisper-large-v3-turbo-german"
91
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
92
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
93
+ )
94
+ model.to(device)
95
+ processor = AutoProcessor.from_pretrained(model_id)
96
+ pipe = pipeline(
97
+ "automatic-speech-recognition",
98
+ model=model,
99
+ tokenizer=processor.tokenizer,
100
+ feature_extractor=processor.feature_extractor,
101
+ max_new_tokens=128,
102
+ chunk_length_s=30,
103
+ batch_size=16,
104
+ return_timestamps=True,
105
+ torch_dtype=torch_dtype,
106
+ device=device,
107
+ )
108
+ dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
109
+ sample = dataset[0]["audio"]
110
+ result = pipe(sample)
111
+ print(result["text"])
112
+ ```
113
+
114
+
115
+ ## [About us](https://primeline-ai.com/en/)
116
+
117
+ [![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)
118
+
119
+
120
+ Your partner for AI infrastructure in Germany <br>
121
+ Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.
122
+
123
+
124
+
125
+ Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)