Where is the benchmarking dataset coming from?

#3
by zhiminy - opened
This comment has been hidden
인스트럭트.한국 org

There’s no such dataset being used.

It’s elo arena

This comment has been hidden
인스트럭트.한국 org

People prompt 2 model and choose better model. (blinded test)
Then we use elo algorithm to change elo of model.

People prompt 2 model and choose better model. (blinded test)
Then we use elo algorithm to change elo of model.

My mistake, I found it is actually the users who propose prompts directly...
1710806011087.png

zhiminy changed discussion status to closed

Sign up or log in to comment