primeline
/

distil-whisper-large-v3-german

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

flozi00 commited on Apr 15

Commit

00cd4fd

•

1 Parent(s): 5b6eb6b

Update README.md

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -1,3 +1,62 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- de
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
 ---
+# whisper-tiny-german
+This model is a German Speech Recognition model based on the [whisper-tiny](https://huggingface.co/openai/whisper-tiny) model.
+The model weights count 756M parameters and with a size of 1.51GB in bfloat16 format.
+As a follow-up to the [Whisper large v3 german](https://huggingface.co/primeline/whisper-large-v3-german) we decided to create a distilled version for a faster inference with minimal quality loss.
+## Intended uses & limitations
+The model is intended to be used for German speech recognition tasks.
+It can be used as local transkription service or as a part of a larger pipeline for speech recognition tasks.
+While counting only half of the parameters of the large model, the quality is still very good and can be used for most tasks.
+The latency is low enough to be used in real-time applications when using optimization toolkits like tensorrt.
+## Dataset
+The dataset used for training is a filtered subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, multilingual librispeech and some internal data.
+The data was filtered and double checked for quality and correctness.
+We did some normalization to the text data, especially for casing and punctuation.
+## Model family
+| Model                            | Parameters | link                                                         |
+|----------------------------------|------------|--------------------------------------------------------------|
+| Whisper large v3 german          | 1.54B      | [link](https://huggingface.co/primeline/whisper-large-v3-german) |
+| Distil-whisper large v3 german   | 756M       | [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) |
+| tiny whisper                     | 37.8M      | [link](https://huggingface.co/primeline/whisper-tiny-german) |
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3e-05
+- total_train_batch_size: 512
+- num_epochs: 5.0
+### Framework versions
+- Transformers 4.39.3
+- Pytorch 2.3.0a0+ebedce2
+- Datasets 2.18.0
+- Tokenizers 0.15.2
+## [About us](https://primeline-ai.com/en/)
+[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)
+Your partner for AI infrastructure in Germany <br>
+Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.
+Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)