Issue/Bug replicating HumanEval result

#4
by emrgnt-cmplxty - opened

Hi all,

I'm looking to replicate the HumanEval result for this model so that I can then go on to testing on interesting orthogonal benchmarks.

Unfortunately, I find that the model goes off the rails frequently, and is likely far from Phind's quoted performance when i attempt to replicate. Does anyone see an obvious bug here - https://github.com/emrgnt-cmplxty/zero-shot-replication/blob/main/zero_shot_replication/model/hugging_face_model/phind_model.py?

For reference, I am seeing output like that shown:


def is_multiply_prime(a):
    """Write a function that returns true if the given number is the multiplication of 3 prime numbers
    and false otherwise.
    Knowing that (a) is less then 100. 
    Example:
    is_multiply_prime(30) == True
    30 = 2 * 3 * 5
    """

    def is_prime(n))):
        if n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n

This model have the Theta of 1000000. Is there any way to implement that in the script?

Thanks for reporting, we'll investigate

The eval code in the model card just worked for me. Could you please let me know if that works for you?

I will test explicitly tomorrow, I don't think there are any significant diffs w.r.t what I am doing, but this can help pinpoint.

The eval code in the model card just worked for me. Could you please let me know if that works for you?

same here, every outputs end with same words, it seems there is no end_token here

There is some commentary in the reddit thread here -> https://www.reddit.com/r/LocalLLaMA/comments/164754t/wizardcoder_eval_results_vs_chatgpt_and_claude_on/

It does seem that the issue is related to transformers version.

Can confirm, running off transformers main brach commit worked.

I tried this code on single gpu. but getting bad results.

   from transformers import AutoTokenizer, LlamaForCausalLM
   from transformers import BitsAndBytesConfig
   import torch
   import os 

   model_path = "Phind/Phind-CodeLlama-34B-v2"
   model = LlamaForCausalLM.from_pretrained(model_path, load_in_8bit=True, device_map="auto")
   #model = LlamaForCausalLM.from_pretrained(model_path, quantization_config=nf4_config)

    tokenizer = AutoTokenizer.from_pretrained(model_path, legacy=True)
    tokenizer.pad_token_id = tokenizer.eos_token_id

   text = "Write a code in python for Inferecing large language models using Transformers library. Give step by step approach."

   inputs = tokenizer(text, return_tensors="pt").to("cuda:0")

   out = model.generate(**inputs, max_length=200, temperature=0.9, repetition_penalty=1.5, do_sample=True)
   print(tokenizer.decode(out[0][len(inputs['input_ids'][0]):]))

This is the output i am getting.

In order to inferencing with transformer model, we need use the Hugging Face's pytorch-transformers Library.
Step 1: Installation of Libraries
You can install this required useful very necessary important big huge immense massive monstrous enormous vast colossal portentious prodigious sizeable sizable mammoth mind mouth multitudinously numberless numb numerous novel nones none non non nonsensical senseless insignificant inconsequentialist unimportant small sm
python
# Importing Necessary nec es ess ent en env e environments  needed environment environments environments
import torch
from transformers import AutoModelForMaskedLM,AutoTokenizerFastBert BertConfigP
class Class Config Model Token BERT For
config = class Auto

Can someone suggest?

Sign up or log in to comment