Inference API Parameters

#47
by Shivkumar27 - opened

I want to use gemma-2b-it model using Inference API. I want to know what parameters do we need to pass in the body of the API, and what will be the format to pass the body into the inference API which can give me good results.

Currently I am following this format in the body while using Inference API, but I am not getting good response.

{"input":"[INST]{{prompt}}\n\n{{Context}}\n\n{{Question}}\n\n{{assistant}}\n\n[/INST]}

Do I need to include parameters like temperature, top-p, max new tokens, repetition penalty ?? If yes then please correct me with what body do i need to pass.

Thanks & Regards
Shiv Kumar

Hi @Shivkumar27 , are you using the interface on the model card or HuggingFace's Inference API?

Hi @datamancer88 , I am using HuggingFace's Inference API

Google org

You can adjust inference parameters for models on Hugging Face's Inference API through the model card metadata. This allows you to adjust settings like aggregation_strategy and temperature.

Further reference here on ‘How can I control my model’s widget Inference API parameters?’

For example, you can specify these parameters for text generation:

Prompt: "Write a poem about a lonely robot."
inference:
  parameters:
    aggregation_strategy: "none"
    temperature: 0.7

Sign up or log in to comment