Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.9.1
Overview
Featherless is a serverless platform designed to make it 100x easier to experiment with LLMs. This space brings the magic of featherless as close as possible to the models on Hugging Face.
The Problem with LLM Experimentation
Most models aren't useable on huggingface.co. Even "small" models (like llama3-8bs) require expensive enough hardware that they aren't stood up to operate for free, and for users to experiment with them requires allocating GPUs. HF's inference API is the simplest way to do it, but you'd be hard pressed to experiment with 10 models in a day with this approach.
Enter Featherless
Our goal is to make all models on Hugging Face available serverlessly and enable a new kind of experimentation. With over 2,200 supported models available today, we're well on our way. Check out featherless.ai to see the full range of supported models.
Why a Hugging Face Space?
This space is intended to bring some of the magic of Featherless as close to supported model cards as possible. It currently lets you run inference of all the <=15B models supported on Featherless. (subject to a concurrency limit for the API token set in the secrets of this space)
You're of course welcome to clone this space but know that it's stock gradio with a call to the featherless API (i.e. /chat/completions
) through the openai python package (like many inference providers, the Featherless API is OpenAI compatible) so you'll need a Featherless API key for it to work for you, which you can get at featherless.ai.
Thanks for stopping by! Feedback welcome in the community section of the space, by email to hello@featherless.ai, or on our discord