Loading model on Huggingface serverless API

#25
by isaiahVolley - opened

Hi all. I'm really excited to test this model out, but I am having issues using the Huggingface Inference API to access it.

The model doesn't seem to load correctly. If I access the API directly, I get the following error. No matter how long I wait, it's always this loading error.

$ curl https://api-inference.huggingface.co/models/nvidia/canary-1b \
        -X POST \
        --data-binary '@[audiofile].wav' \
        -H "Authorization: Bearer [token]"

{"error":"Model nvidia/canary-1b is currently loading","estimated_time":20.0}

Likewise, if I use the GUI, I can see that the model times out while loading:
Screenshot 2024-04-16 at 10.43.15 AM.png

I've poked around with other Nvidia ASR models, and it seems to happen on other models as well. I'm was not able to reproduce this behavior on models such as openai/whisper-large-v3. Thanks for your help.

bumping this

bumping this

@martimsilva @isaiahVolley , i dont work at NVidia or anything , but i've been around the way. ok, so , when folks host their models for free it's free to us but not free to them. so what the owner does it to reduce the time before the endpoint goes to sleep. so , basically the first ping you send to the endpoint will usually "wake up the model" this can take some minutes. once the model is loaded you can try again in the (small) window that is open to use it. the best you can do is make a wait/try loop to make sure the request goes through when the model is up. Btw the model is very light and can run anywhere , so that's another way to fix this problem : just run your own endpoint ;-)

Screenshot 2024-05-23 130855.png
Screenshot 2024-05-23 130840.png

ok i just tried it, usually i'm right about the above, but not this time ! so i found y'all a different solution, use this endpoint instead : https://huggingface.co/spaces/nvidia/canary-1b/

scroll to the bottom "use via api" , you can also deploy the code locally.

image.png

hope this helps , sorry for the confusion !

Yeah it worked for me, many thanks!

I don't think too much about this model. Too much fuss over teensy details and the big picture is lost. I get it that it's nice to get punctuation in your Result but what purpose does it server when the Entire result is wrong. All of it! This is ranting
image.png
I kept repeatedly saying "I don't know what came over me" in several different modulations and I get a completely useless answer.
"i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can work in a way i can"

Granted! I tried the model in Nvidia's own page and not in Huggingface but how different can it be? Anyone else getting good results and I'm the only one getting crappy results?

I've tried deploying it as a paid dedicated endpoint, and it doesn't seems to be supported this way:
https://huggingface.co/nvidia/canary-1b/discussions/26

Sign up or log in to comment