Benchmark says 1.4 is better than 2.0?

by aetherwu - opened Aug 11, 2023

Discussion

aetherwu

Aug 11, 2023

jondurbin/airoboros-33b-gpt4-1.4 64.89
jondurbin/airoboros-33b-gpt4-m2.0 63.42
jondurbin/airoboros-33b-gpt4-2.0 63.27

jondurbin

Owner Aug 11, 2023

Indeed. I made no claim that 2.0 or m2.0 would be better than 1.4; from the model card:
"The 2.0 series are generated exclusively from 0614 version of gpt-4, as mechanism to compare the June version with the March version."

The 2.0 dataset was made mostly to test whether or not gpt-4 had performance degradation, and it seems perhaps it did. I posted some evaluation of the output here:
https://www.reddit.com/r/LocalLLaMA/comments/15i53h3/airoboros_20m20_releaseanalysis/

aetherwu

Aug 18, 2023

Now they are both suggested by
Another LLM Roleplay Rankings
https://rentry.co/ALLMRR
airoboros-l2-13b-gpt4-2.0-GPTQ <- for chatting.
airoboros-l2-13b-gpt4-m2.0-GPTQ <- for RP.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment