Poor performance?

#6
by Fionn - opened

Saw some reports that this performs better than Falcon-7B. Was interested to try that out!

Unfortunately, using a handful of tests the performance seems quite poor. For example, the prompt: "Label the tweets as either 'positive', 'negative', 'mixed', or 'neutral': Tweet: I can say that there isn't anything I would change. " returns

"  Tweet: @jessicajayne haha yay for us!  Tweet: @gabrielladixon I have no idea how you do it.  Tweet: @jessicajayne haha yay for us!  Tweet: @gabrielladixon I have no idea how you do it.  Tweet: @gabrielladixon I have no idea how you do it.  Tweet: @gabrielladixon I have no idea how you do it.  Tweet: @gabrielladixon I"

Using the following parameters

"parameters": {
    "max_new_tokens": 128,
"temperature": 0.7, 
"top_p": 0.7, 
"top_k":50
  }

Is this expected or unexpected?

Together org

@Fionn Thank you for your interests! In general it's not expected. I can offer some tips to help improve the performance of the model:

  1. Always append "Label:", "Output:", or "Answer:" at the end of the prompt. This helps the model understand that it needs to provide the answer instead of completing the input tweet.
  2. Feel free to use newlines to separate instructions, input, and output for better organization.

Based on these tips, you can format your input as follows:

Label the tweets as either 'positive', 'negative', 'mixed', or 'neutral'.

Tweet: I can say that there isn't anything I would change.
Label:

This formatting will prompt the model to provide a label for the given tweet, which should be positive.

Additionally, it would be very helpful to include examples to help the model better understand what you're looking for. For example:

Label the tweets as either 'positive', 'negative', 'mixed', or 'neutral'.

Tweet: The weather is good.
Label: positive

Tweet: I can say that there isn't anything I would change.
Label:

Thanks for the quick and detailed response @juewang !

I tested with your feedback, but unfortunately, it's still quite poor. For me, inputting the second example you gave returns:

{'generated_text': "\n    Tweet: 

@marcus

	 I'm glad you're feeling better!\n    Label: positive\n    \n    Tweet: @kylegriffin1  you're not going to be happy until you've turned the whole world into a bunch of democrats.\n    Label: negative\n    \n    Tweet: @TheTweetOfGod i don't have twitter, but i did read the article and it was very good!\n    Label: positive\n    \n    Tweet: @DjThunderLips I think you should try it. I've never been to one, but"}

For reference, I running this using the HuggingFace text inference docker

Together org

@Fionn
The code snippet below works for me:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('togethercomputer/RedPajama-INCITE-7B-Instruct', torch_dtype=torch.float16).to('cuda:0')
tokenizer = AutoTokenizer.from_pretrained('togethercomputer/RedPajama-INCITE-7B-Instruct')

inputs = tokenizer("""Label the tweets as either 'positive', 'negative', 'mixed', or 'neutral'.

Tweet: The weather is good.
Label: positive

Tweet: I can say that there isn't anything I would change.
Label:""", return_tensors='pt').to(model.device)

output = model.generate(**inputs, max_new_tokens=32)[0, inputs.input_ids.size(1):]
print(tokenizer.decode(output))
# ==>
'''
 positive

Tweet: @jennifer_truax I'm so sorry.  I hope you feel better soon.  I'm glad you
'''

Can you check that there are no extra spaces or "\n" at the end of the prompt? They are very harmful for BPE tokenizer-based models.

Thanks @juewang , as I said I'm using the inference API. I haven't been able to reproduce your results, but I'll keep trying. Thanks for the support!

Sign up or log in to comment