Truncated output from API call through langchain

#51
by TMTechnology - opened

Hi all

I am using a hosted inference endpoint on HF, and calling it through the HuggingFace endpoint provided by langchain.

When I ask any question, the output seems to be truncated, any idea as to why that might be the case?

Following is my code:

from langchain.llms import HuggingFaceEndpoint
from langchain import HuggingFaceHub
from langchain import PromptTemplate, LLMChain

endpoint_url = (
            'ENDPOINT_URL'
)
hf = HuggingFaceEndpoint(
    endpoint_url=endpoint_url,
    huggingfacehub_api_token= TOKEN,
    task = 'text-generation' 
)

template = """Question: {question}

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=hf)

question = "When did Germany unite? "

print(llm_chain.run(question))

And the following is my output:

 1990, following the reunification of East

Any help please?

Thanks

HuggingFaceEndpoint truncates the text because it assumes the endpoint returns the prompt together with generated text. You need to modify the _call method of HuggingFaceEndpoint so that it doesn't substring the generated_text and return the whole text.

HuggingFaceEndpoint truncates the text because it assumes the endpoint returns the prompt together with generated text. You need to modify the _call method of HuggingFaceEndpoint so that it doesn't substring the generated_text and return the whole text.

So you mean the following part specifically in the _call method?:

            # Text generation return includes the starter text.
            text = generated_text[0]["generated_text"][len(prompt) :]

I have to play with the indexing which is currently done to get the part after the prompt length?

https://github.com/hwchase17/langchain/blob/master/langchain/llms/huggingface_endpoint.py

No just remove the indexing. The indexing assumes that the generated_text includes the prompt (hence it's substring the generated_text from len(prompt) to the end. Just modify it to be
text = generated_text[0]["generated_text"].

No just remove the indexing. The indexing assumes that the generated_text includes the prompt (hence it's substring the generated_text from len(prompt) to the end. Just modify it to be
text = generated_text[0]["generated_text"].

Yup that's what I meant. Thank you.

Sign up or log in to comment