Issue/Bug replicating HumanEval result

by emrgnt-cmplxty - opened Aug 29, 2023

Aug 29, 2023

Hi all,

I'm looking to replicate the HumanEval result for this model so that I can then go on to testing on interesting orthogonal benchmarks.

Unfortunately, I find that the model goes off the rails frequently, and is likely far from Phind's quoted performance when i attempt to replicate. Does anyone see an obvious bug here - https://github.com/emrgnt-cmplxty/zero-shot-replication/blob/main/zero_shot_replication/model/hugging_face_model/phind_model.py?

For reference, I am seeing output like that shown:


def is_multiply_prime(a):
    """Write a function that returns true if the given number is the multiplication of 3 prime numbers
    and false otherwise.
    Knowing that (a) is less then 100. 
    Example:
    is_multiply_prime(30) == True
    30 = 2 * 3 * 5
    """

    def is_prime(n))):
        if n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n

acrastt

Aug 29, 2023

This model have the Theta of 1000000. Is there any way to implement that in the script?

michaelroyzen

Phind org Aug 29, 2023

Thanks for reporting, we'll investigate

michaelroyzen

Phind org Aug 29, 2023

The eval code in the model card just worked for me. Could you please let me know if that works for you?

emrgnt-cmplxty

Aug 29, 2023

I will test explicitly tomorrow, I don't think there are any significant diffs w.r.t what I am doing, but this can help pinpoint.

waytohou

Aug 29, 2023

The eval code in the model card just worked for me. Could you please let me know if that works for you?

same here, every outputs end with same words, it seems there is no end_token here

emrgnt-cmplxty

Aug 29, 2023

There is some commentary in the reddit thread here -> https://www.reddit.com/r/LocalLLaMA/comments/164754t/wizardcoder_eval_results_vs_chatgpt_and_claude_on/

It does seem that the issue is related to transformers version.

Ilianos

Aug 29, 2023

https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0/discussions/13

emrgnt-cmplxty

Aug 29, 2023

Can confirm, running off transformers main brach commit worked.

Satya4093

Sep 4, 2023

I tried this code on single gpu. but getting bad results.

   from transformers import AutoTokenizer, LlamaForCausalLM
   from transformers import BitsAndBytesConfig
   import torch
   import os 

   model_path = "Phind/Phind-CodeLlama-34B-v2"
   model = LlamaForCausalLM.from_pretrained(model_path, load_in_8bit=True, device_map="auto")
   #model = LlamaForCausalLM.from_pretrained(model_path, quantization_config=nf4_config)

    tokenizer = AutoTokenizer.from_pretrained(model_path, legacy=True)
    tokenizer.pad_token_id = tokenizer.eos_token_id

   text = "Write a code in python for Inferecing large language models using Transformers library. Give step by step approach."

   inputs = tokenizer(text, return_tensors="pt").to("cuda:0")

   out = model.generate(**inputs, max_length=200, temperature=0.9, repetition_penalty=1.5, do_sample=True)
   print(tokenizer.decode(out[0][len(inputs['input_ids'][0]):]))

This is the output i am getting.

In order to inferencing with transformer model, we need use the Hugging Face's pytorch-transformers Library.
Step 1: Installation of Libraries
You can install this required useful very necessary important big huge immense massive monstrous enormous vast colossal portentious prodigious sizeable sizable mammoth mind mouth multitudinously numberless numb numerous novel nones none non non nonsensical senseless insignificant inconsequentialist unimportant small sm
python
# Importing Necessary nec es ess ent en env e environments  needed environment environments environments
import torch
from transformers import AutoModelForMaskedLM,AutoTokenizerFastBert BertConfigP
class Class Config Model Token BERT For
config = class Auto

Can someone suggest?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment