[Theory] Why Venus failed and Goliath is good

#14
by ChuckMcSneed - opened

After seeing public benchmarks fail, I decided to make my own proprietary benchmarks to quantify the models. I created a few proprietary tests to test the abilities of the models to follow commands and creative writing. What came out is quite interesting: parents of this model had high "creativity" scores while parents of venus had quite low scores, which dragged the model down. What do you think?

Maybe you could test some lower param models? I wonder how it would stack up.

Though the tests looks to be for higher param models. So I'm not sure how it will fair.

if venus doesn't have layers from models made for creative writing and goliath does it makes sense that goliath scores better on benchmarks than vensu

Sign up or log in to comment