Can paying for the compute to get a model posted be opened up to more parties than Meta, please?

#163
by spaceman7777 - opened

The only limiter for getting a model onto the scoreboard is the model making its way through the queue. Except for the cases where companies like Meta have paid to get their scores get computed first.

Can other people pay for the GPU time to run the eval test suite?

As many have noted, there are a number of important foundational models, and models with particularly unique lineages, that have never been able to make it through the queue, and get posted.

Some of the more important, and relevant, high performance models that are excluded from the leaderboard include:

  • Any Google T5 models
  • Any Google UL2 models
  • BERT and related
  • the newly pretrained-from-scratch mpt-8k-7b model series
  • rwkv models
  • others (there are likely other base models the perform on par with llama-1-7b, which is the bar I'd set for "particularly relevant" at the current time)

I, for one, would definitely consider just paying for the server bandwidth to get a model evaluated, as certain large companies have done. The server rental cost for running an eval is quite low, afaik

Whatever the case, for as long as these large exclusions persist, and the leaderboard remains fully dominated by Llama fine tunings, and a bunch of models from <=2022, this leaderboard just doesn't seem like it should be posted and advertised as an accurate evaluation of the open LLM landscape

The server rental fees aren't that bad, and a large portion of the Community forum's threads for this project consist of queries about exclusions of important novel models, so I'd like to know if there is anything the community can do to contribute to this project, and make it an accurate listing of the capabilities of novel open large language models (and more llamas too, if people have another one they want to get posted)

Impatient+sad :'( from waiting for so many weeks/months for important models to get evaluated. I'd love a self-serve page where I could select a model, and pay money to kick off an eval job

I am not a HF employee so I can't say I know anything official. I would not assume HF is getting paid to place models on Leaderboard. It is more about their unused cluster capacity used to evaluate things as a community service but I can't tell you for sure. If HF did get official grant money to work on everything you mentioned, things could get done faster. If anything Meta should make a donation to HF to get more resources put on this sort of projects. Meta is getting free PR if anything here.

Open LLM Leaderboard org

@felixz What you said is very close to what happens internally, good insight 🤗
We use the unused cluster capacity to run evaluations for the community for free (on A100 GPUs), because we believe this is good for open source in general to be able to compare models. (This also means that sometimes the leaderboard slows down a bit because we have bigger experiments taking up the majority of the cluster).
We absolutely did not get paid for the evaluation of the Meta models, and I have no idea where this (completely wrong) assumption came from ^^''

@spaceman7777 For the T5/BERT/UL2 models, we updated our backend several times, and removed models of type AutoModelForSeq2SeqLM(that were not submitted to the leaderboard at the time anyway) to avoid carrying non updated legacy code. Adding these models back is quite high on our todo list for the summer, since now some people have been asking for them.

For the other models, if they are not natively supported by a transformers stable release, we won't support them. Please take into account that behind maintaining and upgrading this leaderboard (as well as answering user queries) is a team of only 2 people (and I'm not 100% on this project). As much as we would like to be able to go faster, there's only so much we can humanely do.

clefourrier changed discussion status to closed

Sign up or log in to comment