primeline
/

whisper-large-v3-german

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisper-large-v3-german / README.md

flozi00's picture

Update README.md

dcc11d2 verified 3 months ago

|

history blame contribute delete

No virus

3.63 kB

	---
	license: apache-2.0
	language:
	- de
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: whisper-large-v3-german by Florian Zimmermeister @primeLine
	results:
	- task:
	type: automatic-speech-recognition
	name: Speech Recognition
	dataset:
	name: Common Voice de
	type: common_voice_15
	args: de
	metrics:
	- type: wer
	value: 3.002 %
	name: Test WER
	- type: cer
	value: 0.81 %
	name: Test CER

	---


	### Summary
	This model map provides information about a model based on Whisper Large v3 that has been fine-tuned for speech recognition in German. Whisper is a powerful speech recognition platform developed by OpenAI. This model has been specially optimized for processing and recognizing German speech.



	### Applications
	This model can be used in various application areas, including

	- Transcription of spoken German language
	- Voice commands and voice control
	- Automatic subtitling for German videos
	- Voice-based search queries in German
	- Dictation functions in word processing programs


	## Model family

	\| Model \| Parameters \| link \|
	\|----------------------------------\|------------\|--------------------------------------------------------------\|
	\| Whisper large v3 german \| 1.54B \| [link](https://huggingface.co/primeline/whisper-large-v3-german) \|
	\| Distil-whisper large v3 german \| 756M \| [link](https://huggingface.co/primeline/distil-whisper-large-v3-german) \|
	\| tiny whisper \| 37.8M \| [link](https://huggingface.co/primeline/whisper-tiny-german) \|


	### Training data
	The training data for this model includes a large amount of spoken German from various sources. The data was carefully selected and processed to optimize recognition performance.


	### Training process
	The training of the model was performed with the following hyperparameters

	- Batch size: 1024
	- Epochs: 2
	- Learning rate: 1e-5
	- Data augmentation: No


	### How to use

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset
	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
	model_id = "primeline/whisper-large-v3-german"
	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)
	processor = AutoProcessor.from_pretrained(model_id)
	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	max_new_tokens=128,
	chunk_length_s=30,
	batch_size=16,
	return_timestamps=True,
	torch_dtype=torch_dtype,
	device=device,
	)
	dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
	sample = dataset[0]["audio"]
	result = pipe(sample)
	print(result["text"])
	```


	## [About us](https://primeline-ai.com/en/)

	[![primeline AI](https://primeline-ai.com/wp-content/uploads/2024/02/pl_ai_bildwortmarke_original.svg)](https://primeline-ai.com/en/)


	Your partner for AI infrastructure in Germany <br>
	Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High-Performance Computing. Optimized for AI training and inference.



	Model author: [Florian Zimmermeister](https://huggingface.co/flozi00)