Fork of t5-11b

This is fork of t5-11b implementing a custom handler.py as an example for how to use t5-11b with inference-endpoints on a single NVIDIA T4.

Model Card for T5 11B - fp16

model image

Use with Inference Endpoints

Hugging Face Inference endpoints can be used with an HTTP client in any language. We will use Python and the requests library to send our requests. (make your you have it installed pip install requests)


Send requests with Pyton

import json
import requests as r

ENDPOINT_URL=""# url of your endpoint

# payload samples
regular_payload = { "inputs": "translate English to German: The weather is nice today." }
parameter_payload = {
    "inputs": "translate English to German: Hello my name is Philipp and I am a Technical Leader at Hugging Face",
  "parameters" : {
    "max_length": 40,

# HTTP headers for authorization
headers= {
    "Authorization": f"Bearer {HF_TOKEN}",
    "Content-Type": "application/json"

# send request
response = r.post(ENDPOINT_URL, headers=headers, json=paramter_payload)
generated_text = response.json()

