Error loading LM

by kfmn0 - opened 8 days ago

8 days ago

Hi,
On attempt to recognize audio, I got an error:
ValueError: Found wrong number of files in directory. Expected 3 files, found ['attrs.json', 'lm_wiki.arpa']
It seems, model expects to see 3 files in LM folder but finds only 2. How can I fix this?

Yehor

Owner 8 days ago

It seems transformers requires a vocab file, so you can generate them from lm_wiki.arpa

Alternatively, find another KenLM model here - https://huggingface.co/Yehor/kenlm-uk

kfmn0

8 days ago

Thanks for this answer, but the error occurs on the first model loading from HF hub, when the following code is performed:
pipe = pipeline("automatic-speech-recognition", model="Yehor/w2v-xls-r-uk")
So it seems, the vocab should be added into the repo, not generated manually.
Maybe there is a way to use locally generated vocab on the model loading phase but I don't know it, please advise.

Yehor

Owner 7 days ago

Look at the arpa file, it’s just text file. You can extract the list of words from this arpa file and use it with the model.

kfmn0

7 days ago

It's not a problem for me to extract list of words, but loading pipeline looks at files in repository for the first time to load and cache them locally, and the failure occurs on exactly this stage.
If there is an alternative way to load model and other stuff from HF hub and use them locally, please advise how to do this.

Yehor

Owner 7 days ago

Yes, you can set the local path to this model. Just clone the repository.

kfmn0

7 days ago

•

edited 7 days ago

OK, I have cloned the repository and extracted the vocab from arpa LM (BTW, which filename it should have?)
I still have no local pretrained model to load.

If I run the code:
from transformers import AutoProcessor, AutoModelForCTC
processor = AutoProcessor.from_pretrained("Yehor/w2v-xls-r-uk")
model = AutoModelForCTC.from_pretrained("Yehor/w2v-xls-r-uk")
it again tries to download some info from HF and fails on the same point.

Could you please share the working code or add vocabulary into the https://huggingface.co/Yehor/w2v-xls-r-uk/tree/main/language_model by yourself ?

Yehor

Owner 7 days ago

filename it should have

unigrams.txt

processor = AutoProcessor.from_pretrained("Yehor/w2v-xls-r-uk")

point to local folder instead of HF handle of the model

kfmn0

7 days ago

•

edited 7 days ago

Thank you very much, it works now!

kfmn0 changed discussion status to closed 7 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment