True humaneval score or NewHope repeat?

#1
by rombodawg - opened

How can we be sure that this model is actually beating gpt-4 because it was trained super well, and not because humaneval data was leaked into your training data for the model? Have you made sure to remove any training data from the dataset before training the model?

Yes, we made sure to check for decontamination and we found none. Our training set is quite different from Humaneval and mostly helps align it, which shows that CodeLlama-34B is already quite strong.

michaelroyzen changed discussion status to closed

We use the same decontamination process as OpenAI: https://www.phind.com/blog/code-llama-beats-gpt4.

@michaelroyzen one more question, do you plan on releasing the dataset? or is it remaining closed source?

rombodawg changed discussion status to open

I am also wondering whether dataset will be released

Not at this time. It's part of our secret sauce. But we plan to continue releasing models -- stay tuned for v2 in a few days.

michaelroyzen changed discussion status to closed

Sign up or log in to comment