how to output an answer without side chatter

#36
by Gerald001 - opened

Hi,

im using this model for a summarization task with meta-llama/Meta-Llama-3-8B-Instruct:

You will be given a tuple which includes data from a XYZ
Your job is to simply output a 1 line sentence based on the data provided in the tuple. ...

###Things to consider when generating the sentence###
- It should feel intuitive. 
- ...

The format of the tuple is as follows:  ...

...

The output structure should be as follows (you must strictly follow this):
<Output sentence from tuple data>

Unfortunately the problem with meta-llama/Meta-Llama-3-8B-Instruct is that they are trained to be more "conversational" and it can be hard to just get them to output an answer without side chatter:

response_text: ['Here is the output sentence based on the provided tuple:\n\nThe Default Prefilter Policy_1 has a description of "Default Prefilter Policy_1 with default action to allow all tunnels".assistant\n\nI\'m ready for the next tuple!assistant\n\nPlease provide the next tuple.assistant\n\nGo ahead and provide the next tuple. I\'ll generate the output sentence based on the data.assistant\n\nI\'m waiting!assistant\n\nPlease go ahead and provide the next tuple.assistant\n\nI\'m ready!assistant\n\nGo ahead and provide the next tuple.assistant\n\nI\'m waiting...assistant\n\nPlease provide the next tuple.assistant\n\nI\'m ready!assistant\n\nGo ahead and provide the next tuple.assistant\n\nI\'m waiting...assistant\n\nPlease provide the next tuple.assistant\n\nGo ahead and provide the next tuple.assistant\n\nI\'m ready!assistant\n\nPlease provide the next tuple.assistant\n\nGo ahead and provide the next tuple.assistant\n\nI\'m waiting...']

What techniques are best to address chattiness for this specific model? There was not chattiness if i used the meta-llama/Llama-2-7b-chat-hf and meta-llama/Llama-2-7b-chat model as far i noticed...

Will there be a meta-llama/Meta-Llama-3-8B-chat model too?

Thanks,
Gerald

I usually include something like:

Respond only as shown, with no additional discursive or explanatory text.
And always provide an example (ie, one-shot in-context training)

You need to update your software, the underlying issues have been fixed by now.

@nschle what software? where you see its has been fixed? im using:

from sagemaker.huggingface import HuggingFaceModel, HuggingFacePredictor
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

payload = {
            "inputs": tokenizer.apply_chat_template(
                [
                    {
                        "role": "user",
                        "content": content,
                    }
                ],
                tokenize=False,
                add_generation_prompt=True,
            ),
            "parameters": self.parameters,
        }
response = llm.predict(payload)
generated_text = response[0]["generated_text"]

The was an issue with handling of eot_id vs eos, I assume that's what @nschle was referring to, as opposed to the more general prompt issue I referred to above.
grab the updated config.json and generation_config.json. maybe tokenizer_config.json also just to be sure.

@bdambrosio will the official repo here be updated too?

I believe it has been. You can check the above file dates. I belleve you will see they have later dates (by a few hours or a day or so) than the weights files.

import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B"

pipeline = transformers.pipeline(
"text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)

pipeline("hi")

why carsh and not give response?
i run it on colab

Sign up or log in to comment