openai/whisper-large-v3 · Challenges in Distinguishing Similar Phonemes (e.g., 'B' and 'V') in License Plate Speech Recognition Using Whisper-Large Model

Hello everyone,

I am currently working on a speech recognition project focused on transcribing license plate numbers using OpenAI's Whisper-Large-V2 model. I've encountered difficulties in accurately distinguishing between similar-sounding letters, particularly 'B' and 'V', since both can precede any combination of letters and numbers (e.g., 'BAA-1234' and 'VAA-1234' are both valid formats). Given the lack of contextual cues in license plate sequences, I attempted fine-tuning the model, but observed only marginal improvements in accuracy. What strategies or techniques have you found effective in improving the model's ability to differentiate between such phonetically similar letters in this context?
Any insights or recommendations would be greatly appreciated.

Thank you!