Where is the benchmarking dataset coming from?
#3
by
zhiminy
- opened
This comment has been hidden
There’s no such dataset being used.
It’s elo arena
This comment has been hidden
People prompt 2 model and choose better model. (blinded test)
Then we use elo algorithm to change elo of model.
zhiminy
changed discussion status to
closed