tiny_scream_april_beta

This model is a fine-tuned version of openai/whisper-tiny on the NbAiLab/NCC_speech_all_v5 dataset. It uses a beam size of 5.

Model description

This is a BETA version. You need to accept the terms and conditons to use it.

Using the Model

There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.:

import torch
import numpy as np
import librosa
from transformers import pipeline

# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not
device = torch.device("cuda")

pipe = pipeline("automatic-speech-recognition",
      model="NbAiLab/tiny_scream_april_beta",
      chunk_length_s=30,
      device=device,
      max_new_tokens=128,
      generate_kwargs={"language": "", "task": "transcribe"})

# Load the WAV file. Modify this to use mp3 instead
audio_path = 'myfile.wav'
samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True)

# Run the pipeline
prediction = pipe(samples)["text"]

print(prediction)

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-05
lr_scheduler_type: linear
per_device_train_batch_size: 48
total_train_batch_size_per_node: 192
total_train_batch_size: 1536
total_optimization_steps: 50000
starting_optimization_step: None
finishing_optimization_step: 50000
num_train_dataset_workers: 64
total_num_training_examples: 76800000

Training results

step	eval_loss	train_loss	eval_wer	eval_cer
0	2.1853	2.6128	225.2741	151.0305
2500	0.8090	0.6776	26.0049	10.4006
5000	0.5674	0.5277	20.7674	8.7327
7500	0.5255	0.4551	19.3971	8.5059
10000	0.5774	0.4327	18.0877	8.0272

Framework versions

Transformers 4.28.0.dev0
Datasets 2.11.0
Tokenizers 0.13.2

NbAiLabArchive
/

tiny_scream_april_beta