@automatedstockminingorg on Hugging Face: "hi everyone, i have just uploaded my first fine tuned model, but serverless…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

automatedstockminingorg

posted an update 29 days ago

Post

1759

hi everyone, i have just uploaded my first fine tuned model, but serverless inference client is'nt available, its built with transformer architecture and is just a fine tuned llama 8b instruct. does anyone know how to make serverless inference available on a model?

John6666

28 days ago

•

edited 28 days ago

The Serverless Inference API was significantly degraded a few months ago, making it almost unusable unless it is labeled Warm.
The conditions under which it will be Warm are unknown, but it is safe to say that it is impossible to aim for.
If you create a Spaces with Gradio, it may work, so you can try that.
https://www.gradio.app/guides/using-hugging-face-integrations

automatedstockminingorg

28 days ago

thanks

ZeroXClem

27 days ago

You can use Modal Labs to run inference. GPU's take ~ 30 seconds to provision. Here's a quickstart on the matter: https://modal.com/docs/examples/vllm_inference

ZeroXClem

27 days ago

This comment has been hidden

ZeroXClem

27 days ago

This comment has been hidden

In this post