This model is surprisingly good.

#1
by Nitral-AI - opened

@Lewdiculous Decided to try it out after i saw it take off on chaiverse, was originally ignoring since it was merging fodder. However, it performs very well in my own testing.
image.png

Wow, nice, congrats on reaching that far up! I'm gonna give it a shot

@ABX-AI I put it up last night funny enough, and i have another one i slapped up this morning that specs out a little better. Best of luck my dude!

@Nitral-AI In my mind the Chaiverse leaderboard is a bigger meme than our usual memeboard, but reaching the top spots is pretty good and gets eyes on your work.
πŸ‘

have another one this morning that specs out a little better

Do you now? πŸ‘€

According to their evaluations anyways.
image.png

Safety score πŸ‘€

54/55% WR pretty good for KukulStanta

How's KukulStanta-7B expected to behave context size wise? Probably bad since it was just the pre-merge fodder for v0.420-32k, but looks like that is working.

@Nitral-AI In my mind the Chaiverse leaderboard is a bigger meme than our usual memeboard, but reaching the top spots is pretty good and gets eyes on your work.
πŸ‘

I can say the elo ratings are weird.
One of my models elo ratings fluctuates 10elo points in a day or roughly 30% or 1021-1031
But the user preferences and stay in character ratings seem more stable? using your instruct preset on the leaderboard made the "entertaining" rating increase a bit and boosted user preferences to, so those numbers may be slightly more reliable? As using your instruct preset changed the ratings in a similar way to how I would rate a model using default alpaca vs your preset.
Just thought if it's trustworthy enough you may be able to get free user feedback on your context presets by publishing one model with multiple presets and seeing how they do
tldr; elo bad, user preferences maybe?

How's KukulStanta-7B expected to behave context size wise? Probably bad since it was just the pre-merge fodder for v0.420-32k, but looks like that is working.

8k unless we have some kind of outlier here.

@Nitral-AI In my mind the Chaiverse leaderboard is a bigger meme than our usual memeboard, but reaching the top spots is pretty good and gets eyes on your work.
πŸ‘

I can say the elo ratings are weird.
One of my models elo ratings fluctuates 10elo points in a day or roughly 30% or 1021-1031
But the user preferences and stay in character ratings seem more stable? using your instruct preset on the leaderboard made the "entertaining" rating increase a bit and boosted user preferences to, so those numbers may be slightly more reliable? As using your instruct preset changed the ratings in a similar way to how I would rate a model using default alpaca vs your preset.
Just thought if it's trustworthy enough you may be able to get free user feedback on your context presets by publishing one model with multiple presets and seeing how they do
tldr; elo bad, user preferences maybe?

Will keep that in mind for the future!

@Nitral-AI

8k unless we have some kind of outlier here.

I'll take to v0.420 32K for now then.

I'm not even sure how to post a model there but I'm legit quite surprised with this one: https://huggingface.co/ABX-AI/Cosmic-Citrus-9B

In a certain... scenario, the character decided to introduce a point system to reward the {{user}}'s submissiveness. I've never seen this in any of my other tests :D :D I'm kinda loving it so far but I'm not really sure how to configure it for the PR LB, never really posted to any LB so far
What should be selected as "reward model repository?"
And do you change the formatting in the advanced tab or leave it as it is?

It's pretty easy to submit a model, all the defaults usually work well. But the most important thing is to set "best of" to the highest. try 16, it may not let you because it's larger than 7b. in that case use 8.
If you use anything less than the max, your model will rank really low because every other model is already using the highest
And stick with the default reward model

It's pretty easy to submit a model, all the defaults usually work well. But the most important thing is to set "best of" to the highest. try 16, it may not let you because it's larger than 7b. in that case use 8.
If you use anything less than the max, your model will rank really low because every other model is already using the highest
And stick with the default reward model

thanks! Is adding an access token a normal thing for these LBs? I'm guessing it's fine considering how many submission there are. And the LB is pretty active atm as well (I've basically been picking from it as to what models to use for merging and it's been a success more or less haha

You'll only need to give a token to access a private repo, so you wont need to give one as your model repo is public
So it's normal to give a read only token to allow an external source to access your private repos, otherwise it's not necessary.
Still, be cautious what you give any access to your tokens

You'll only need to give a token to access a private repo, so you wont need to give one as your model repo is public
So it's normal to give a read only token to allow an external source to access your private repos, otherwise it's not necessary.
Still, be cautious what you give any access to your tokens

Thanks a lot! That's exactly why I thought it's weird (the repo is public), this makes sense now! BTW I was thinking a gguf will drop for this one here but I might just make one for myself to try it out later today

Since KukulStanta-7B seemed to score better I went with that one xD

That one is also excellent

Owner
β€’
edited Apr 3

Since KukulStanta-7B seemed to score better I went with that one xD
@Lewdiculous

image.png
indeed, i had a feeling it was going to do better.

Wow, nice! I'm kind of surprised that one of mine is nearing the top 10 of the list, but it's ABX-AI/Infinite-Laymons-7B and I expected some of the 9Bs to score better. It's probable that the 8 shot limit is messing with them as they are NOT twice as good as a 7B for sure, haha. So it looks like 7Bs are favored heavily on this LB due to the double amount of shots, and then the next best thing would be 30B+ models. Although, I would actually prefer many of the 7Bs there to a mixtral noromaid in practical usage.

The models on the lb were just recently wiped and auto reuploaded . They now all have the same shots
Screenshot 2024-04-03 235711.png
Screenshot 2024-04-03 235720.png
It seems the defaults have been changed entirely too.
Edit - A large number of the models seem to be inactive?
Edit 2 - The inactive models are all stuck around 5000ish wins while the deployed models have over 9000. They must have been inactive for a while now

New methods!
Screenshot 2024-04-04 000905.png

Sign up or log in to comment