facebook/wav2vec2-large-960h-lv60-self · The Output was unpredictable

We have done finetuning of facebook/wav2vec2-large-960h-lv60-self. But we found the output text getting only "space" & alphabet "d". We have trained 809 audio files. Not able to find out the exact issue here. Please suggest me any solution that will help me out. Thank you

Dataset({
features: ['pred_str', 'text'],
num_rows: 809
})
Test WER: 1.000
pred_str ; text
0 dddd ; they live by fraud and violence and bequeath t...
1 dd ;but hang it that's not my fault
2 bisque of crawfish
3 ;don't worry it will come out all right
4 ddd ;the priest says the prayers makes the sign of ...
5 d ;and i add and father madeleine is buried ah
6 ;jean valjean had placed her near the fire
7 ;no one came forwards to help the mother and th...
8 ddddd ;after the marriage which was a brilliant and g...
9 ;no unless you can tell me when to expect him home

training_args = TrainingArguments(
output_dir= repo_name,
overwrite_output_dir=True,
group_by_length=True,
per_device_train_batch_size=5,
gradient_accumulation_steps=2,
evaluation_strategy="steps",
num_train_epochs=3,
fp16=True,
gradient_checkpointing=True,
save_steps=100,
eval_steps=100,
logging_steps=10,
learning_rate=1e-4,
weight_decay=0.005,
warmup_steps=500,
save_total_limit=2,
)