Gibberish model outputs on all models and processors

#4
by Lvegna - opened

No matter which combination of pasted code from a Facebook/wav2vec2 blog I use the model outputs are always gibberish and WER does not improve in training. This is especially strange since earlier last week everything seemed to be working (at least when using the pre-trained model for inference), but now without changing anything all I get is gibberish. I even created a whole new python install on a different machine and got the same results. Here is an example of some of the code I pasted and the results I am getting:


import soundfile as sf
import torch
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

# load pretrained model
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")


librispeech_samples_ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")

# load audio
audio_input, sample_rate = sf.read(librispeech_samples_ds[0]["file"])

# pad input values and return pt tensor
input_values = processor(audio_input, sampling_rate=sample_rate, return_tensors="pt").input_values

# INFERENCE

# retrieve logits & take argmax
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)

# transcribe
transcription = processor.decode(predicted_ids[0])

print(transcription)

====================
Output below
====================

kuk_aktkckwrk[PAD]klk_dktkcwkrk_arthcrkskqkbaktkdcrkbykrthcru_zkzdcrmkdksakakckarks zr'kckrkswcrfkdkskzkrktkbrk'kckdkmkbkucrkhk_akrkfkbakqkckdkrk

Just for reference this code works perfectly on google colab and outputs "A MAN SAID TO THE UNIVERSE SIR I EXIST"

I have further determined that running this bit of code in my training step is what is causing the output corruption. Everything works until this bit of code is ran for the first time then every python file in the directory where that bit of code is ran starts outputting gibberish.

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=atco2["train"],
    eval_dataset=atco2["test"],
    tokenizer=processor.feature_extractor,
)

Sign up or log in to comment