Error loading LM
Hi,
On attempt to recognize audio, I got an error:
ValueError: Found wrong number of files in directory. Expected 3 files, found ['attrs.json', 'lm_wiki.arpa']
It seems, model expects to see 3 files in LM folder but finds only 2. How can I fix this?
It seems transformers requires a vocab file, so you can generate them from lm_wiki.arpa
Alternatively, find another KenLM model here - https://huggingface.co/Yehor/kenlm-uk
Thanks for this answer, but the error occurs on the first model loading from HF hub, when the following code is performed:
pipe = pipeline("automatic-speech-recognition", model="Yehor/w2v-xls-r-uk")
So it seems, the vocab should be added into the repo, not generated manually.
Maybe there is a way to use locally generated vocab on the model loading phase but I don't know it, please advise.
Look at the arpa file, it’s just text file. You can extract the list of words from this arpa file and use it with the model.
It's not a problem for me to extract list of words, but loading pipeline looks at files in repository for the first time to load and cache them locally, and the failure occurs on exactly this stage.
If there is an alternative way to load model and other stuff from HF hub and use them locally, please advise how to do this.
Yes, you can set the local path to this model. Just clone the repository.
OK, I have cloned the repository and extracted the vocab from arpa LM (BTW, which filename it should have?)
I still have no local pretrained model to load.
If I run the code:
from transformers import AutoProcessor, AutoModelForCTC
processor = AutoProcessor.from_pretrained("Yehor/w2v-xls-r-uk")
model = AutoModelForCTC.from_pretrained("Yehor/w2v-xls-r-uk")
it again tries to download some info from HF and fails on the same point.
Could you please share the working code or add vocabulary into the https://huggingface.co/Yehor/w2v-xls-r-uk/tree/main/language_model by yourself ?
filename it should have
unigrams.txt
processor = AutoProcessor.from_pretrained("Yehor/w2v-xls-r-uk")
point to local folder instead of HF handle of the model
Thank you very much, it works now!