How to get loss function value for a given audio and text pair

#97
by souraj - opened

I have Audio and text and just want to see if they correspond to same data point (meaning is text is what is being said in the audio). One way is transcribe and match the text. But I want to see the loss function if we pass the pair to whispers forward function. How can I do that? What should be correct values for input decoder_ids, and labels? An example would really help

Sign up or log in to comment