How to get the confidence of the transcribed result?

#2
by matthew36 - opened

How can I get the confidence score of the result?

Hey matthew, you can get the probabilities of each token like this:

Loading model

import librosa

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

y, sr = librosa.load('audio.mp3', sr=16000)

MODEL_NAME = "alvanlii/whisper-small-cantonese"

processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).cuda()

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
model.config.use_cache = False

Generate output, note the output_scores flag

processed_in = processor(y, sampling_rate=sr, return_tensors="pt")
gout = model.generate(
    input_features=processed_in.input_features.cuda(), 
    output_scores=True, return_dict_in_generate=True
)

Compute softmax from the scores

proba_scores = [torch.nn.functional.softmax(gout.scores[idx]).max() for idx in range(len(gout.scores))]
# the ids are now in .sequences
transcription = processor.batch_decode(gout.sequences, skip_special_tokens=True)[0]
print(transcription)

Thank you so much alvanlii! I will try it later. 感謝大佬

可吾可以試下convert 去ggml
在whisper cpp 行

可吾可以試下convert 去ggml
在whisper cpp 行

你可以自己行 convert-h5-to-ggml.py

Sign up or log in to comment