[FEEDBACK] Inference Providers

#49
by julien-c - opened
Hugging Face org

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

Hugging Face org
edited Jan 28

Love that I can call DeepSeek R1 directly from the Hub 🔥

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="together",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.choices[0].message)

Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(

Hugging Face org

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

Thanks for your quick reply, good to know!

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

Could be good to add featherless.ai

TitanML !!

Hugging Face org

With a Pro subscription, are there any limits to token usage or queuing constraints when using a custom API key and direct calls? The free tier on Cerebras did have such constraints.

@sh8459131 When using a custom key, requests are forwarded to Cerebras directly so their limits will apply

Hugging Face org

@alexman83 can you share some sample code you're using? We might need to update smolagents to expose the new bill_to parameter. cc @albertvillanova for viz

@julien-c of course!
Thanks!

from smolagents import CodeAgent
from extraction_smolagents.custom_tools import CSVRetrieverTool
from smolagents import HfApiModel

from huggingface_hub import login
login()

prompt_template = """
# Prompt per Analisi e Estrazione di Topic a partire da una richiesta dell'utente
Sei un esperto analista di contenuti televisivi. Devi estrarre una lista di topics a partire da una richiesta dell'utente nel seguente modo

## Passaggio 1: Analizza la richiesta dell'utente
- Comprendi dalla richiesta dell'utente, indicata dopo la parola 'richiesta', quali sono i topic di suo interesse

## Passaggio 2: Confronto tematiche estratte con quelle fornite
- Leggi il file csv 'topics_info.csv' contente come informazioni il nome del topic (colonna name), le parole rappresentative (colonna representation) e i documenti rilevanti (representative_docs)
- Confronta le parole rappresentative dei vari topic con le tematiche estratte al Passaggio 1 e memorizza solo le righe dei topic che soddisfano questo requisito
- Adesso analizza per i topic memorizzati al passo precedente i documenti rappresentativi e verifica quali siano simili ai temi estratti dalla richiesta dell'utente al Passaggio 1 e memorizzali

## Passaggio 3: Generazione dell'output
Genera un file json contenente:
- la lista dei topic estratti usando il valore della colonna name
- la motivazione per cui sono stati scelti

Organizza il file json come nel segunete esempio:

json
{
    "topics": [<topic_1>, <topic_2>, <topic_3>],
    "motivazione: <motivazione>
}
"""

retriever = CSVRetrieverTool()
llm_model = HfApiModel(model_id='Qwen/Qwen2.5-Coder-32B-Instruct')
agent = CodeAgent(
    tools=[retriever],
    model=llm_model,
    verbosity_level=2,
    additional_authorized_imports = ['pandas']
)

question = prompt_template + '\n' + "voglio tematiche musicali"
answer = agent.run(question)
print(f"Answer: {answer}")

This is the custom class for reading CSV

from smolagents import Tool
import pandas as pd

class CSVRetrieverTool(Tool):
    
    name = "csv_retriever"
    description = "Uses the provided path to access a csv file using pandas dataframe"
    inputs = {
        "path": {
            "type": "string",
            "description": "The path containing the filename of the csv to read",
        }
    }
    output_type = "string"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def forward(self, path) -> str:
        df = pd.read_csv(path)
        return df.to_string()
```
Hugging Face org

@alexman83 Merve ( @merve ) opened https://github.com/huggingface/smolagents/pull/1260 which will expose the bill_to param in smolagents' InferenceClient 🔥

Hugging Face org

(you'll need to upgrade your smolagents version)

@julien-c We just started integrating us WaveSpeedAI http://wavespeed.ai as the inference provider. We provide blazing fast image and video generation services. Happy to see we are listed!

@julien-c why is wavespeed on the list? they serve like 2 models. runware is faster and better :)

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

We have no rate limits at nCompass (https://app.compass.tech) with fast inference. Feel free to give it a spin if rate limits are something that's a bottleneck for you.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment