Jae-Won Chung
More explanations, default plot, compute average
55aeee4
raw
history blame
933 Bytes
model,arc,hellaswag,truthfulqa
lmsys/vicuna-7B,53.5,77.5,49.0
lmsys/vicuna-13B,52.9,80.1,51.8
tatsu-lab/alpaca-7B,52.6,76.9,39.6
metaai/llama-7B,51.1,77.7,34.1
metaai/llama-13B,56.3,80.9,39.9
camel-ai/CAMEL-13B-Combined-Data,55.5,79.3,47.3
BlinkDL/RWKV-4-Raven-7B-v12-Eng98%-Other2%-20230521-ctx8192.pth,NaN,NaN,NaN
databricks/dolly-v2-12b,42.2,71.8,33.4
FreedomIntelligence/phoenix-inst-chat-7b,45.0,63.2,47.1
h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2,36.9,61.6,37.9
lmsys/fastchat-t5-3b-v1.0,35.9,46.4,48.8
Neutralzz/BiLLa-7B-SFT,27.7,26.0,49.0
nomic-ai/gpt4all-13b-snoozy,56.1,78.7,48.4
openaccess-ai-collective/manticore-13b-chat-pyg,58.7,82.0,48.9
OpenAssistant/oasst-sft-1-pythia-12b,45.6,69.9,39.2
project-baize/baize-v2-7B,48.5,75.0,41.7
BAIR/koala-7b,47.1,73.7,46.0
BAIR/koala-13b,52.9,77.5,50.1
StabilityAI/stablelm-tuned-alpha-7b,31.9,53.6,40.2
togethercomputer/RedPajama-INCITE-7B-Chat,42.2,70.8,36.1