Spaces:
Runtime error
Runtime error
Consider including OpenChat 3 models for human evaluation
#2
by
imone
- opened
OpenChat 3 is based on Llama-2, which is the best 13B model on AlpacaEval GPT-4 instruction evaluation, and greatly outperforms the existing open-source dialogue models. Considering including it in human evaluation?