Best so far :3

#2
by saishf - opened

I've tried so many Llama-3 models but this is by far my favourite. I'm happy to use it in place of a solar model and definitely over a mistral model. What ever sorcery you did, it worked!
Way better than any of the Llama-3 models on the top of chaiverse leaderboard
Namely:
Llama-3-LewdPlay-8B-evo
Roleplay-Llama-3-8B
They struggle with anything fantasy?
Then i tried
Llama-3SOME-8B-v1-BETA
And ended up banishing it because it can't even understand genders and assumes every encounter is only M/F
I guess overall thank you for the model! 😸
Edit - Fix readability

@jeiku I'm redoing these Imatrix quants with the new https://github.com/ggerganov/llama.cpp/pull/6920 merged

@saishf I'll upload it later this afternoon (uploading...)

@jeiku I'm redoing these Imatrix quants with the new https://github.com/ggerganov/llama.cpp/pull/6920 merged

@saishf I'll upload it later this afternoon (uploading...)

Thank you! I'll see if I can figure out how to perplexity test, I'd like to see the differences.

Since I prefer using KoboldCpp I'll wait to see if I notice any improvements but from ppls tests if I'm not mistaken PPL dropped like half in some cases.

Since I prefer using KoboldCpp I'll wait to see if I notice any improvements but from ppls tests if I'm not mistaken PPL dropped like half in some cases.

I wanna do a perplexity test with something rp formatted as I think that will be more representative for the use case. I do hope it fixes the formatting issues in responses though😺

That's a hopeful dream, haha.

Would be awesome.

For me the model is super chatty even though I told it to write short replies, it just keeps on going... does anybody has a fix for that, please? Otherwise it seems pretty decent! But hasn't been able to test much in my usual formats because of the talkativeness 😅 (I'm using a regular Llama 3 Instruct but it works on other models usually)

Supposedly using the Llama 3 Presets is supposed to stop generating at the right time. How many tokens per response are you getting? What limit did you set?

I usually use 512 token limit, but other models stop correctly way before, I don't usually change it... this model seems to like to keep going up to that limit lol.
He goes like: small action "Small text." small action "Small text." small action "Small text." over and over...

Alright which preset did you use so far? I'd recommend giving Virt's a try.

Linked in my quant page:
https://huggingface.co/Lewdiculous/SOVL_Llama3_8B-GGUF-IQ-Imatrix

A custom made one, but it's basically default Llama 3 with my custom system prompt. I prefer quick-based roleplay rather than storytelling ones usually.

Are these your own quants or mine?

Yours, q8.

A head scracher, um, I did use the new bpe tokenizers as expected. KoboldCpp version is 1.64?

Yep :( Seems to repeat itself after a while even haha. Behold, wall of text:

image.png
(Character card says he's icy-behaviored and not very talkative, talks in a rushed manner.)

I'm currently busy and can't personally test but I'd ask that you try both linked presets first as these are my controls.

What you show looks like an issue with Prompt formatting, despite what we can expect - or hopefully just Samplers.

Try the linked presets for a control test.

@saishf - Was there degradation in your end?

I personally use a token limit and sentence trimming with this model because yeah, it does like to spew a lot of words.
It acts almost like it's not injecting end tokens for me too, that's why I limit the tokens so short.

Same issue with the custom RP preset sadly yeah - even on a new chat :(

I have hopes that cgatos new spice models in the works will perform this well but fix all these little llama things we're learning about. Spicy llama will be fun

Is there a reason you didn't make Imatrix GGUF for TheSpice 0.8.3 btw @Lewdiculous :o? I see you updated the 0.1.3 one.

Current Llama limitations then...?

Well, so I was pinged about https://huggingface.co/Undi95/Llama-3-LewdPlay-8B-evo and supposedly it handles this a lot better, when it comes to sticking to concise responses in a roleplay chatting experience. So I guess that's worth trying.

Current Llama limitations then...?

Well, so I was pinged about https://huggingface.co/Undi95/Llama-3-LewdPlay-8B-evo and supposedly it handles this a lot better, when it comes to sticking to concise responses in a roleplay chatting experience. So I guess that's worth trying.

I've tried evo, I gave up on it in three responses. I'm really picky though. So it's still worth trying for others.

0.1.3 was initially a Model-Request, and I saw it scoring so high on the Chaiverse leaderboard that I didn't look at the other one. But I'll get to it later.

@saishf Is there any specific deal breaker about evo?

0.1.3 was initially a Model-Request, and I saw it scoring so high on the Chaiverse leaderboard that I didn't look at the other one. But I'll get to it later.

The latest epochs are scoring insanely high 😭
Screenshot_20240502-144444.png

They lead to a 404 page :c Idk how they get those haha.

@saishf Is there any specific deal breaker about evo?

Not that I found, i just disliked the way it wrote. And it struggled a little with my less common characters

Forget The Spice, now I need The Sauce!

They lead to a 404 page :c Idk how they get those haha.

Seems to be testing before release, I guess they just want to make sure the next release is the best possible

(I have the same block-of-text issue with LewdPlay evo btw, maybe a L3 issue? TheSpice works fine without any instruct template even)

Llama-3 formatting never really jived with me because it had some weird formatting too often, but the repetition wasn't something I experienced but I need to test now with the new tokenizers. Meh, busy for a while til I can really sit with that.

TheSpice at least uses ChatML or a simple Alpaca or hells it has to be flexible then.

Oh, does it? On the screenshot it does but on their description page it says it uses basic {{user}} and {{char}} instruct :o (and it works perfect like that too for some reason without even a stop sequence)

1000036215_x16.png
This- this makes me want to strangle whoever decided a new template was a good idea
Nothing will fix it 😿

Oh, does it? On the screenshot it does but on their description page it says it uses basic {{user}} and {{char}} instruct :o (and it works perfect like that too for some reason without even a stop sequence)

A good model will accept either chatml or alpaca, like seen with mistral and solar. It's just a matter of figuring out how to train it into llama3

1000036215_x16.png
This- this makes me want to strangle whoever decided a new template was a good idea
Nothing will fix it 😿

100% my feelings. This is the bane of my existence.

@saishf Okay hear me out! Maybe if we use HTML tags to format instead? Um? Um? It surely will not make these mistakes with HTML tags.

@saishf Okay hear me out! Maybe if we use HTML tags to format instead? Um? Um? It surely will not make these mistakes with HTML tags.

Python could work, it scores really high in python benchmarks

Resplendent AI org

I know I'm late but couldn't you find some way to use regex to fix examples like that in SillyTavern? Don't ask me how, I have to research regex rules every time I use it for anything...

@jeiku - I'm too stoopid.

@saishf I said HTML bcs ST supports HTML formatting nicely already just like Markdown so it should be plug and play and seems harder to misplace tags than simple asterisks.

I know I'm late but couldn't you find some way to use regex to fix examples like that in SillyTavern? Don't ask me how, I have to research regex rules every time I use it for anything...

Just ask llama 😼 if it knows what it is 🐥

@jeiku - I'm too stoopid.

@saishf I said HTML bcs ST supports HTML formatting nicely already just like Markdown so it should be plug and play and seems harder to misplace tags than simple asterisks.

Does it support leany text like wonky?

It supports <font color=purple>"text-here"</font> so I'm assuming it has to support <i>italics</i>...
I chatted with a character that used this format so it's possible.

It supports <font color=purple>"text-here"</font> so I'm assuming it has to support <i>italics</i>...
I chatted with a character that used this format so it's possible.

I don't wanna learn another language 😭
I'll leave it to smart people 😶‍🌫️
I just know llamas spit at people and that's what llama3 is doing 🥲

I'm having token issues with Kcpp 1.64 too, so maybe it's the app's fault too :o

Oh wow, didn't realize this thread was active with this issue. I feel it is similar to my issue, if not the same. I used to have runaway prompts but after a bit, got it to not exceed 500 tokens too much, with or without num_predict or token_max. Usually. though, it would for about 200 tokens give or take repeat the same/similar ending phrase, and if not that, it would have very stiff vocabulary. I posted it's own thread for this, but i'll post it here since this seems to be the same issue to my surprise, I resolved it using this template.

{{ if .System }}<|start_header_id|>system
{{ .System }}<|end_header_id|>
{{ end }}{{ if .Prompt }}<|start_header_id|>user
{{ .Prompt }}<|end_header_id|>
{{ end }}<|start_header_id|>assistant
{{ .Response }}<|end_header_id|>

It's a combination of Llama3 and mistral, using mistrals layout, but Llamas headers instead of im start/end.

also my settings in Ollama:
{"mirostat":2,"mirostat_eta":0.25,"mirostat_tau":4.5,"num_ctx":8192,"num_predict":288,"repeat_penalty":1.35,"temperature":1,"top_k":100,"top_p":0.8}

I haven't fine tuned these post template fix yet, as they are mostly gradually adjusted from using the LLama template at this time. but the initial results make me not feel the need and not even know what to tune as it's been spot on so far. maybe a tad more temperature/tau/top_p is all I can think of.
do note the lack of "stop", i've given up on it as it doesn't "stop" so.. not sure how related that decision is lol.

here is a before and after, with a mostly default Llama3 to start (think i removed response and tail eot, think this was causing the runaway prompts), and my new template after
2.png

1.png

Sign up or log in to comment