Inference on HF Endpoints API?

#4
by kastan - opened

I have this model running on the Endpoints API, but I can't get it to accept BOTH text and image inputs simultaneously.
CleanShot 2023-09-28 at 12.07.44.png

What is the required schema?

I also asked here: https://github.com/huggingface/api-inference-community/issues/336

I got close, but it seems it only accepts a single string as input because it's part of the "Text-Generation" family of models.

import json
import requests
import base64

img_url = "https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG"
API_URL = "https://api-inference.huggingface.co/models/HuggingFaceM4/idefics-9b-instruct" 
headers = {
    "Authorization": f"Bearer {HF_TOKEN}",
    "Content-Type": "application/json"
}

def query(image_url):
    response = requests.get(image_url)
    image_bytes = response.content
    encoded_image = base64.b64encode(image_bytes).decode('utf-8')
    data = {
        "inputs": img_url,
            # "prompt": "What's in this image?",
            # "prompt": encoded_image
        # }
        # "image": encoded_image,
        # "inputs": "What's in this image?",
    }
    json_data = json.dumps(data)
    print("my request", json_data)
    response = requests.request("POST", API_URL, headers=headers, data=json_data)
    print("Response content:", response.content)
    return json.loads(response.content.decode("utf-8"))

print(query(img_url))

## The results seem nearly there! 
## [{'generated_text': 'https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPGScooby-Doo, Where Are You!'}]
HuggingFaceM4 org
edited Sep 29, 2023

Hi!
For TGI, if you look into the IDEFICS Playground code for example, you'll see this piece of code:

def prompt_list_to_tgi_input(prompt_list: List[str]) -> str:
    """
    TGI expects a string that contains both text and images in the image markdown format (i.e. the `![]()` ).
    The images links are parsed on TGI side
    """
    result_string_input = ""
    for elem in prompt_list:
        if is_image(elem):
            if is_url(elem):
                result_string_input += f"![]({elem})"
            else:
                result_string_input += f"![]({gradio_link(img_path=elem)})"
        else:
            result_string_input += elem
    return result_string_input

As the docstrings says, TGI is expecting a string with images in markdown format, so if you pass a list of interleaved image, text to this function it should work.

Sign up or log in to comment