Benchmark says 1.4 is better than 2.0?

#1
by aetherwu - opened

jondurbin/airoboros-33b-gpt4-1.4 64.89
jondurbin/airoboros-33b-gpt4-m2.0 63.42
jondurbin/airoboros-33b-gpt4-2.0 63.27

Indeed. I made no claim that 2.0 or m2.0 would be better than 1.4; from the model card:
"The 2.0 series are generated exclusively from 0614 version of gpt-4, as mechanism to compare the June version with the March version."

The 2.0 dataset was made mostly to test whether or not gpt-4 had performance degradation, and it seems perhaps it did. I posted some evaluation of the output here:
https://www.reddit.com/r/LocalLLaMA/comments/15i53h3/airoboros_20m20_releaseanalysis/

Now they are both suggested by
Another LLM Roleplay Rankings
https://rentry.co/ALLMRR
airoboros-l2-13b-gpt4-2.0-GPTQ <- for chatting.
airoboros-l2-13b-gpt4-m2.0-GPTQ <- for RP.

Sign up or log in to comment