This model is surprisingly good.
@Lewdiculous
Decided to try it out after i saw it take off on chaiverse, was originally ignoring since it was merging fodder. However, it performs very well in my own testing.
Wow, nice, congrats on reaching that far up! I'm gonna give it a shot
@Nitral-AI
In my mind the Chaiverse leaderboard is a bigger meme than our usual memeboard, but reaching the top spots is pretty good and gets eyes on your work.
π
have another one this morning that specs out a little better
Do you now? π
Safety score π
54/55% WR pretty good for KukulStanta
How's KukulStanta-7B expected to behave context size wise? Probably bad since it was just the pre-merge fodder for v0.420-32k, but looks like that is working.
@Nitral-AI In my mind the Chaiverse leaderboard is a bigger meme than our usual memeboard, but reaching the top spots is pretty good and gets eyes on your work.
π
I can say the elo ratings are weird.
One of my models elo ratings fluctuates 10elo points in a day or roughly 30% or 1021-1031
But the user preferences and stay in character ratings seem more stable? using your instruct preset on the leaderboard made the "entertaining" rating increase a bit and boosted user preferences to, so those numbers may be slightly more reliable? As using your instruct preset changed the ratings in a similar way to how I would rate a model using default alpaca vs your preset.
Just thought if it's trustworthy enough you may be able to get free user feedback on your context presets by publishing one model with multiple presets and seeing how they do
tldr; elo bad, user preferences maybe?
How's KukulStanta-7B expected to behave context size wise? Probably bad since it was just the pre-merge fodder for v0.420-32k, but looks like that is working.
8k unless we have some kind of outlier here.
@Nitral-AI In my mind the Chaiverse leaderboard is a bigger meme than our usual memeboard, but reaching the top spots is pretty good and gets eyes on your work.
πI can say the elo ratings are weird.
One of my models elo ratings fluctuates 10elo points in a day or roughly 30% or 1021-1031
But the user preferences and stay in character ratings seem more stable? using your instruct preset on the leaderboard made the "entertaining" rating increase a bit and boosted user preferences to, so those numbers may be slightly more reliable? As using your instruct preset changed the ratings in a similar way to how I would rate a model using default alpaca vs your preset.
Just thought if it's trustworthy enough you may be able to get free user feedback on your context presets by publishing one model with multiple presets and seeing how they do
tldr; elo bad, user preferences maybe?
Will keep that in mind for the future!
I'm not even sure how to post a model there but I'm legit quite surprised with this one: https://huggingface.co/ABX-AI/Cosmic-Citrus-9B
In a certain... scenario, the character decided to introduce a point system to reward the {{user}}'s submissiveness. I've never seen this in any of my other tests :D :D I'm kinda loving it so far but I'm not really sure how to configure it for the PR LB, never really posted to any LB so far
What should be selected as "reward model repository?"
And do you change the formatting in the advanced tab or leave it as it is?
It's pretty easy to submit a model, all the defaults usually work well. But the most important thing is to set "best of" to the highest. try 16, it may not let you because it's larger than 7b. in that case use 8.
If you use anything less than the max, your model will rank really low because every other model is already using the highest
And stick with the default reward model
It's pretty easy to submit a model, all the defaults usually work well. But the most important thing is to set "best of" to the highest. try 16, it may not let you because it's larger than 7b. in that case use 8.
If you use anything less than the max, your model will rank really low because every other model is already using the highest
And stick with the default reward model
thanks! Is adding an access token a normal thing for these LBs? I'm guessing it's fine considering how many submission there are. And the LB is pretty active atm as well (I've basically been picking from it as to what models to use for merging and it's been a success more or less haha
You'll only need to give a token to access a private repo, so you wont need to give one as your model repo is public
So it's normal to give a read only token to allow an external source to access your private repos, otherwise it's not necessary.
Still, be cautious what you give any access to your tokens
You'll only need to give a token to access a private repo, so you wont need to give one as your model repo is public
So it's normal to give a read only token to allow an external source to access your private repos, otherwise it's not necessary.
Still, be cautious what you give any access to your tokens
Thanks a lot! That's exactly why I thought it's weird (the repo is public), this makes sense now! BTW I was thinking a gguf will drop for this one here but I might just make one for myself to try it out later today
Since KukulStanta-7B seemed to score better I went with that one xD
That one is also excellent
Wow, nice! I'm kind of surprised that one of mine is nearing the top 10 of the list, but it's ABX-AI/Infinite-Laymons-7B and I expected some of the 9Bs to score better. It's probable that the 8 shot limit is messing with them as they are NOT twice as good as a 7B for sure, haha. So it looks like 7Bs are favored heavily on this LB due to the double amount of shots, and then the next best thing would be 30B+ models. Although, I would actually prefer many of the 7Bs there to a mixtral noromaid in practical usage.
The models on the lb were just recently wiped and auto reuploaded . They now all have the same shots
It seems the defaults have been changed entirely too.
Edit - A large number of the models seem to be inactive?
Edit 2 - The inactive models are all stuck around 5000ish wins while the deployed models have over 9000. They must have been inactive for a while now