[FLAG] deepnight-research/llama-2-70B-inst: Identical models on leaderboard

by jaspercatapang - opened

Hi. The two Llama 2 70B models named upstage/Llama-2-70b-instruct-v2 and deepnight-research/llama-2-70B-inst have identical evaluation results (leaderboard UI) but they were fine-tuned on different datasets.

Screen Shot 2023-08-21 at 5.07.15 PM.png

Upstage's model used the following datasets, according to their model card:

  • Orca-style dataset
  • Alpaca-style dataset

Deepnight Research's model used the following datasets, according to their model card:

  • EleutherAI/pile (30%)
  • TogetherComputer/Long-Data-Collections

No difference was found in the result summaries found here and here. What do you think happened?

Screen Shot 2023-08-21 at 4.59.38 PM.png

PS: I am not accusing anyone of anything, just curious. Thank you.

Open LLM Leaderboard org

I've never seen results identical to so many decimal points for two different models.
A good way to check this would be to load both models from the hub, and check the file's sha.

clefourrier changed discussion title from Identical models on leaderboard to [FLAG] deepnight-research/llama-2-70B-inst: Identical models on leaderboard
Open LLM Leaderboard org
edited Aug 22, 2023

FLAG: Between the editions of the README file, from one identical to the upstage model to a new one, to the identical results (up to the logprobs hashes), it seems extremely likely that the deepnight model is a copy of the upstage model.

clefourrier changed discussion status to closed

The deepnight model has been deleted!

I have a feeling we will be getting a lot more of these soon. It's very possible that some unscrupulous companies could simply duplicate other high-performance models and post them to gain traffic to their website/companies

Maybe put some simple hashing in place to put an auto-warning on the model cards that "this model appears to be a direct copy of..... "

Is it the same as upstage?

They should just be removed from the leaderboard... if they deleted their model and caught with their pants down they should be treated accordingly.

"Celebrating another remarkable achievement! 🚀🌟"

Sounds like the founder of DeepNight is very proud on LinkedIn... The Rocket emoji is always a warning sign. 😊

Can we take this off the leaderboard?

@hunkim 안녕하세요!
One user might have copied Upstage's latest model. You might ask the HuggingFace admins to remove (or not) the fake model from the leaderboard. : )

Perhaps an honest mistake, they probably asked ChatGPT to write a python script to create a top-ranked LLM and it complied by creating a script to clone a top-ranked LLM. Innovation! 😉

Hello community,

I would like to come clean with my doings. I was the one who released the copy of upstage model on deepnight-research organisation.
The model was submitted on Leaderboard by someone else but they didn't know I copied it.

I was an intern at DeepNight and a newbie in AI. I just wanted to impress everyone. I would like to apologise to @hunkim and entire Upstage team for this. I apologise to Kshitij Tyagi and entire DeepNight family as well for this.
DeepNight or anyone else apart from me shall not be held guilty for my doings.

I hope I can be forgiven for my doings.

Thank you

Oh boy.. Well at-least you're owning your mistake..


We appreciate your honesty. It's not easy to admit a mistake, especially in a public forum. We, the Upstage team, accept your apology.

Thank you for accepting my apology.

Sign up or log in to comment