Fixes evaluation instructions and updates WER scores
Hi, I was trying to evaluate the model on LibriSpeech's "clean" and "other" test data following the code snippet in the Model card but I got a TypeError
due to storing the transcriptions in the batch as wrapped in lists instead of as plain strings (e.g. ["transcription example"] instead of "transcription example") in the map_to_pred
function.
TypeError: expected string or bytes-like object
After fixing the error I recomputed the WER and updated the scores without aproximating them. I think the same should be done for other wav2vec2 based models (e.g. facebook/wav2vec2-large-960h-lv60).
Thanks for the catch @andreagasparini ! I'll run the updated script to verify the results. If they match we can merge πͺ I'll also look into updating the other W2V2-based models that share this example script bug.
Thanks for the bug fix - I can verify that the script works and that I get the same results. I would advocate for keeping the change in the evaluation script (fixing the TypeError
in L113) but discarding the ones that update the WER metrics (L27, 41, 116). The reason being that the Wav2Vec2 paper and "official" results are two 1 decimal place (1.9/3.9), and it is the convention in speech literature is to quote WER results to 1 decimal place (WERs of 1.9/3.9 vs 1.86/3.88). Keeping the results to 1 dp. Note that by quoting to 1 d.p., we leave at most a 0.05% uncertainty in our WER metrics, which is tiny for all intensive purposes!
Hope that makes sense! Let me know if you have any questions!
Hi @sanchit-gandhi , I agree with your reasons on keeping the results to 1 decimal place, but at the same time it seems that on the Speech Bench quite all the other models do not follow the same convention for approximation (they seem to be quoted to 2 d.p.).
Should we change all the others or just this one, that's my dilemma π
I see your point! I'd be in favour of quoting to 1 d.p. on the Speech Bench. We can open this as a discussion!
Speech bench discussion: https://huggingface.co/spaces/huggingface/hf-speech-bench/discussions/1