--- base_model: vasista22/whisper-gujarati-small datasets: - 1rsh/gujarati-openslr language: - gu license: apache-2.0 metrics: - wer - cer tags: - hf-asr-leaderboard - generated_from_trainer model-index: - name: Whisper Small Gujarati OpenSLR results: - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Gujarati OpenSLR type: 1rsh/gujarati-openslr args: 'split: train' metrics: - type: wer value: 35.325794291868604 name: WER - type: cer value: 22.3685 name: CER - task: type: automatic-speech-recognition name: Automatic Speech Recognition dataset: name: Google FLEURS type: google/fleurs args: 'config: gu_in; split: test' metrics: - type: wer value: 46.596808306094985 name: WER - type: cer value: 22.69041389733006 name: CER - type: nwer value: 44.01335002085941 name: Normalized WER - type: ncer value: 18.702293460048406 name: Normalized CER --- # Whisper Small Gujarati OpenSLR This model is a fine-tuned version of [vasista22/whisper-gujarati-small](https://huggingface.co/vasista22/whisper-gujarati-small) on the Gujarati OpenSLR dataset. It achieves the following results on the evaluation set: - Loss: 0.0472 - Wer: 35.3258 - Cer: 22.3685 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - num_epochs: 5 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |:-------------:|:------:|:----:|:---------------:|:-------:|:-------:| | 0.0018 | 4.9505 | 1000 | 0.0472 | 35.3258 | 22.3685 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.3.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1 ## Usage In order to infer a single audio file using this model, the following code snippet can be used: ```python >>> import torch >>> from transformers import pipeline >>> # path to the audio file to be transcribed >>> audio = "/path/to/audio.format" >>> device = "cuda:0" if torch.cuda.is_available() else "cpu" >>> transcribe = pipeline(task="automatic-speech-recognition", model="1rsh/whisper-small-gu", chunk_length_s=30, device=device) >>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="gu", task="transcribe") >>> print('Transcription: ', transcribe(audio)["text"]) ```