Sep 26, 2022

•

edited Sep 26, 2022

No matter which combination of pasted code from a Facebook/wav2vec2 blog I use the model outputs are always gibberish and WER does not improve in training. This is especially strange since earlier last week everything seemed to be working (at least when using the pre-trained model for inference), but now without changing anything all I get is gibberish. I even created a whole new python install on a different machine and got the same results. Here is an example of some of the code I pasted and the results I am getting:


import soundfile as sf
import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

# load pretrained model
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")


librispeech_samples_ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")

# load audio
audio_input, sample_rate = sf.read(librispeech_samples_ds[0]["file"])

# pad input values and return pt tensor
input_values = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt").input_values

# INFERENCE

# retrieve logits & take argmax
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)

# transcribe
transcription = processor.decode(predicted_ids[0])

print(transcription)

====================
Output below

kuk_aktkckwrk[PAD]klk_dktkcwkrk_arthcrkskqkbaktkdcrkbykrthcru_zkzdcrmkdksakakckarks zr'kckrkswcrfkdkskzkrktkbrk'kckdkmkbkucrkhk_akrkfkbakqkckdkrk

Lvegna

Sep 26, 2022

Just for reference this code works perfectly on google colab and outputs "A MAN SAID TO THE UNIVERSE SIR I EXIST"

Lvegna

Sep 26, 2022

I have further determined that running this bit of code in my training step is what is causing the output corruption. Everything works until this bit of code is ran for the first time then every python file in the directory where that bit of code is ran starts outputting gibberish.

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=atco2["train"],
    eval_dataset=atco2["test"],
    tokenizer=processor.feature_extractor,
)

facebook
/

wav2vec2-base-960h

Gibberish model outputs on all models and processors

====================
Output below

Gibberish model outputs on all models and processors

====================Output below

====================
Output below