Make arena great again

#4
by recoilme - opened

I really appreciate you spending time on this service, thank you!
But it's quite difficult to enjoy comparing models in the arena because it's boring. Lots of very bad, very old models. New models don't get enough samples. Sampling looks like random? Most of prompts are monotonous and boring

Some suggestions:

  1. Use any well known algorithm for exploration-exploitation dilemma, for example https://en.wikipedia.org/wiki/Thompson_sampling or ucb1
  2. Use/add not so boring prompts (tons of datasets on HF)

Feel free to ask if you need some details how implement a/b tests

Also pls, clean colorfulxl cache, i updated the model on HF at same place, sorry for that
https://huggingface.co/recoilme/colorfulxl

Sign up or log in to comment