Text Generation

Generate text based on a prompt.

If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the chat-completion task.

For more details about the text-generation task, check out its dedicated page! You will find examples and related materials.

Recommended models

google/gemma-2-2b-it: A text-generation model trained to follow instructions.
Qwen/Qwen3-Coder-480B-A35B-Instruct: Powerful text generation model for coding.
openai/gpt-oss-120b: Great text generation model with top-notch tool calling capabilities.
zai-org/GLM-4.5: Powerful text generation model.
Qwen/Qwen3-4B-Thinking-2507: A powerful small model with reasoning capabilities.
Qwen/Qwen2.5-7B-Instruct-1M: Strong conversational model that supports very long instructions.
Qwen/Qwen2.5-Coder-32B-Instruct: Text generation model used to write code.
deepseek-ai/DeepSeek-R1: Powerful reasoning based open large language model.

Explore all available models and find the one that suits you best here, or from the terminal with the hf CLI:

hf models ls --warm --pipeline-tag text-generation --sort trending_score

Using the API

Language

Client

Provider

Settings

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/deepinfra/v1/openai/completions",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash-0731",
    messages="\"Can you please let us know more details about your \"",
)

print(completion.choices[0].message)

API specification

Request

Headers
authorization	string	Authentication header in the form `'Bearer: hf_**'` when `hf_**` is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.

Payload
inputs*	string
parameters	object
adapter_id	string	Lora adapter id
best_of	integer	Generate best_of sequences and return the one if the highest token logprobs.
decoder_input_details	boolean	Whether to return decoder input token logprobs and ids.
details	boolean	Whether to return generation details.
do_sample	boolean	Activate logits sampling.
frequency_penalty	number	The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
grammar	unknown	One of the following:
(#1)	object
type*	enum	Possible values: json.
value*	unknown	A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.
(#2)	object
type*	enum	Possible values: regex.
value*	string
(#3)	object
type*	enum	Possible values: json_schema.
value*	object
name	string	Optional name identifier for the schema
schema*	unknown	The actual JSON schema definition
max_new_tokens	integer	Maximum number of tokens to generate.
repetition_penalty	number	The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
return_full_text	boolean	Whether to prepend the prompt to the generated text
seed	integer	Random sampling seed.
stop	string[]	Stop generating tokens if a member of `stop` is generated.
temperature	number	The value used to module the logits distribution.
top_k	integer	The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_n_tokens	integer	The number of highest probability vocabulary tokens to keep for top-n-filtering.
top_p	number	Top-p value for nucleus sampling.
truncate	integer	Truncate inputs tokens to the given size.
typical_p	number	Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.
watermark	boolean	Watermarking with A Watermark for Large Language Models.
stream	boolean

Response

Output type depends on the stream input parameter. If stream is false (default), the response will be a JSON object with the following fields:

Body
details	object
best_of_sequences	object[]
finish_reason	enum	Possible values: length, eos_token, stop_sequence.
generated_text	string
generated_tokens	integer
prefill	object[]
id	integer
logprob	number
text	string
seed	integer
tokens	object[]
id	integer
logprob	number
special	boolean
text	string
top_tokens	array[]
id	integer
logprob	number
special	boolean
text	string
finish_reason	enum	Possible values: length, eos_token, stop_sequence.
generated_tokens	integer
prefill	object[]
id	integer
logprob	number
text	string
seed	integer
tokens	object[]
id	integer
logprob	number
special	boolean
text	string
top_tokens	array[]
id	integer
logprob	number
special	boolean
text	string
generated_text	string

If stream is true, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out this guide.

Body
details	object
finish_reason	enum	Possible values: length, eos_token, stop_sequence.
generated_tokens	integer
input_length	integer
seed	integer
generated_text	string
index	integer
token	object
id	integer
logprob	number
special	boolean
text	string
top_tokens	object[]
id	integer
logprob	number
special	boolean
text	string

Update on GitHub

Inference Providers

Text Generation

Recommended models

Using the API

API specification

Request

Response