human_eval_llm_leaderboard

Runtime error

Consider including OpenChat 3 models for human evaluation

by imone - opened Aug 6, 2023

Aug 6, 2023

OpenChat 3 is based on Llama-2, which is the best 13B model on AlpacaEval GPT-4 instruction evaluation, and greatly outperforms the existing open-source dialogue models. Considering including it in human evaluation?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment