Feedback

#2
by AdamDel - opened

Hi Katy! Thank you for your models. Over the past few weeks, I’ve been primarily playing with most of your models and now have some feedback for ya. I’ve been comparing them using 3+ different characters (Energized/untamed character, submissive character, naive/compassionate character, and some random shit.) Here is my tier list:

13b Models:
S-rank: EstopianMaidV1, EstopianMaidV2
A-rank: EstopianOrcaCrystalV1
B-rank: EveningStarV3
C-rank: EstopianCrystal

(Other models were deleted cause of some crucial flaws, low IQ or just boring output.)

EstopianMaidV1 Q6-Q8:

  • Good output: descriptive, smart, and it understands the nuances even of a weird cards in sense of they complexity almost perfect.
  • More pliable and submissive ERP-oriented comparing to V2
  • I know, maybe It wasn't very fair to compare V1 Q6 and V2 Q4_K_S.
    EstopianMaidV2 Q4_K_S:
  • Good output: descriptive and smart, especially for Q4_K_S. Rank-S was given as an advance.
  • Balanced well, smooth transition between SFW and NSFW output.
  • Seems to be more toxic and less depend on user dislikes/reactions compared to other 13b models you have, which is captivating.
  • Occasionally surprises with nuances. More quants might sway me toward V2 over V1.
    EstopianOrcaCrystalV1:
  • Better context understanding than EstopianCrystal, EveningStarV3, even EstopianMaidV2 sometimes. Not that it generate better output but it responds more comprehensive to the context itself.
  • Slightly less verbose than EveningStarV3 but maintains a balanced reply length just as EstopianMaidV2. Personally, I would prefer this model over EveningStarV3.
  • Comparing to other 13b models responses may feel more grounded...realistic? or it's just a lower emotional flexibility if I may say so. (in sense of direct speech)
    EveningStarV3:
  • Verbose, colorful.
  • Struggles with adaptivity and context understanding; it feels like it has a blurred understanding of the input.
  • Output can be somewhat dull, lacking nuances. Unlike Maids, additional regens may be needed for better replies.
    EstopianCrystal:
  • Generates shorter replies than expected. Sometimes struggles with context understanding, requiring more regens, nothing too special.

I believe that EstopianMaidV2 has its potential (engaging, unpredictable, descriptive, good understanding of context nuances). You could add a pinch of OrcaCrystal's context understanding, with pray that it will not kill the basic model behavior. Or just polished version of V2, which already feels different compared to V1, dunno. The situation seems similar to SultrySiliconV1 and SultrySiliconV2 where the second was more sultry, but here it has bolder behavior compared to V1.


7b Models:
S-rank: SultrySiliconV2
A-rank: LemonadeRP 0.1, LemonadeRP 0.7, LostCreativity
B-rank: LemonadeRP 1.4, SiliconHatred
C-rank: SmartSunset, BasicBirch

(Other models were deleted cause of some crucial flaws, low IQ or just boring output.)

SultrySiliconV2:

  • The output feels the best of all the 7b models.
  • O̶n̶l̶y̶ o̶n̶e̶ i̶r̶r̶i̶t̶a̶t̶i̶n̶g̶ c̶o̶n̶ w̶h̶i̶c̶h̶ o̶t̶h̶e̶r̶ m̶o̶d̶e̶l̶s̶ d̶o̶n̶'t̶ h̶a̶v̶e̶:̶ Q̶u̶i̶t̶e̶ o̶f̶t̶e̶n̶ w̶i̶t̶h̶ a̶ n̶e̶w̶ p̶a̶r̶a̶g̶r̶a̶p̶h̶ i̶t̶ m̶a̶y̶ s̶u̶m̶m̶a̶r̶i̶z̶e̶ o̶r̶ n̶a̶r̶r̶a̶t̶e̶ t̶h̶e̶ s̶c̶e̶n̶e̶ a̶t̶ t̶h̶e̶ e̶n̶d̶ o̶f̶ t̶h̶e̶ r̶e̶p̶l̶y̶. S̶u̶l̶t̶r̶y̶S̶i̶l̶i̶c̶o̶n̶V̶3̶ d̶o̶e̶s̶n̶’t̶ h̶a̶s t̶h̶i̶s̶ i̶s̶s̶u̶e̶, b̶u̶t̶ S̶u̶l̶t̶r̶y̶S̶i̶l̶i̶c̶o̶n̶V̶2̶ h̶a̶s̶ b̶e̶t̶t̶e̶r̶ o̶u̶t̶p̶u̶t̶. T̶h̶a̶t̶'s̶ a̶ p̶i̶t̶y̶!̶ (Possibly a false alarm and it wasn't the model's issue. If so, then it is the best 7b without any ‘but’.)
    LemonadeRP 0.1 / LostCreativity:
  • Like twins, in a good way. Their output feels similar to each other.
  • Both aren't smarter than SultrySiliconV2 but are verbose and engaging.
  • Feels like LostCreativity more depends on my reaction and instructions than LemonadeRP 0.1. Both have good-mid replies with a balanced length.
  • LostCreativity sometimes struggles with understanding context.
    LemonadeRP 0.7:
  • Feels a bit smarter (better at understanding context) than 0.1.
  • More descriptive but may become boring sometimes. Still feels very similar.
    SiliconHatred:
  • It has one good side, my attention turned the greater toxicity than other 7b models. Which I personally enjoy to see when the model is capable of this, It less relying on my reaction, unlike other 7b models.
  • Has some issues with structure and slightly uncontrolled output length. May start with short messages and finish with uncontrollably long ones.
  • Once or twice during the chat may start using quotes instead of asterisks.
    LemonadeRP 1.4:
  • Unnecessarily stretches the output length.
  • May rarely reply from the third person instead of the first, similar to all Lemonade versions after 1.0.
  • May seem overly aroused, just as all Lemonade versions after 1.0.
    SmartSunset / BasicBirch:
  • Not bad but not as good either, both with issues. SmartSunset may appeal to someone who prefers short replies. BasicBirch may generate overly long replies.

Just wanted to share my 'very important opinion'. Keep going, you really have a good models right there. 👍 Have a great day :)

AdamDel changed discussion status to closed

Sign up or log in to comment