ResplendentAI/SOVL_Llama3_8B

Apr 29, 2024

•

edited Apr 29, 2024

I've tried so many Llama-3 models but this is by far my favourite. I'm happy to use it in place of a solar model and definitely over a mistral model. What ever sorcery you did, it worked!
Way better than any of the Llama-3 models on the top of chaiverse leaderboard
Namely:
Llama-3-LewdPlay-8B-evo
Roleplay-Llama-3-8B
They struggle with anything fantasy?
Then i tried
Llama-3SOME-8B-v1-BETA
And ended up banishing it because it can't even understand genders and assumes every encounter is only M/F
I guess overall thank you for the model! 😸
Edit - Fix readability

Lewdiculous

Apr 30, 2024

•

edited Apr 30, 2024

@jeiku I'm redoing these Imatrix quants with the new https://github.com/ggerganov/llama.cpp/pull/6920 merged

@saishf I'll upload it later this afternoon (uploading...)

saishf

May 1, 2024

@jeiku I'm redoing these Imatrix quants with the new https://github.com/ggerganov/llama.cpp/pull/6920 merged

@saishf I'll upload it later this afternoon (uploading...)

Thank you! I'll see if I can figure out how to perplexity test, I'd like to see the differences.

Lewdiculous

May 1, 2024

Since I prefer using KoboldCpp I'll wait to see if I notice any improvements but from ppls tests if I'm not mistaken PPL dropped like half in some cases.

saishf

May 1, 2024

Since I prefer using KoboldCpp I'll wait to see if I notice any improvements but from ppls tests if I'm not mistaken PPL dropped like half in some cases.

I wanna do a perplexity test with something rp formatted as I think that will be more representative for the use case. I do hope it fixes the formatting issues in responses though😺

Lewdiculous

May 1, 2024

•

edited May 1, 2024

That's a hopeful dream, haha.

Would be awesome.

Varkoyote

May 2, 2024

•

edited May 2, 2024

For me the model is super chatty even though I told it to write short replies, it just keeps on going... does anybody has a fix for that, please? Otherwise it seems pretty decent! But hasn't been able to test much in my usual formats because of the talkativeness 😅 (I'm using a regular Llama 3 Instruct but it works on other models usually)

Lewdiculous

May 2, 2024

Supposedly using the Llama 3 Presets is supposed to stop generating at the right time. How many tokens per response are you getting? What limit did you set?

Varkoyote

May 2, 2024

I usually use 512 token limit, but other models stop correctly way before, I don't usually change it... this model seems to like to keep going up to that limit lol.
He goes like: small action "Small text." small action "Small text." small action "Small text." over and over...

Lewdiculous

May 2, 2024

•

edited May 2, 2024

Alright which preset did you use so far? I'd recommend giving Virt's a try.

Linked in my quant page:
https://huggingface.co/Lewdiculous/SOVL_Llama3_8B-GGUF-IQ-Imatrix

Varkoyote

May 2, 2024

A custom made one, but it's basically default Llama 3 with my custom system prompt. I prefer quick-based roleplay rather than storytelling ones usually.

Lewdiculous

May 2, 2024

Are these your own quants or mine?

Varkoyote

May 2, 2024

Yours, q8.

Lewdiculous

May 2, 2024

•

edited May 2, 2024

A head scracher, um, I did use the new bpe tokenizers as expected. KoboldCpp version is 1.64?

Varkoyote

May 2, 2024

•

edited May 2, 2024

Yep :( Seems to repeat itself after a while even haha. Behold, wall of text:

(Character card says he's icy-behaviored and not very talkative, talks in a rushed manner.)

Lewdiculous

May 2, 2024

•

edited May 2, 2024

I'm currently busy and can't personally test but I'd ask that you try both linked presets first as these are my controls.

What you show looks like an issue with Prompt formatting, despite what we can expect - or hopefully just Samplers.

Try the linked presets for a control test.

@saishf - Was there degradation in your end?

saishf

May 2, 2024

I personally use a token limit and sentence trimming with this model because yeah, it does like to spew a lot of words.
It acts almost like it's not injecting end tokens for me too, that's why I limit the tokens so short.

Varkoyote

May 2, 2024

•

edited May 2, 2024

Same issue with the custom RP preset sadly yeah - even on a new chat :(

saishf

May 2, 2024

I have hopes that cgatos new spice models in the works will perform this well but fix all these little llama things we're learning about. Spicy llama will be fun

Varkoyote

May 2, 2024

Is there a reason you didn't make Imatrix GGUF for TheSpice 0.8.3 btw @Lewdiculous :o? I see you updated the 0.1.3 one.

Lewdiculous

May 2, 2024

Current Llama limitations then...?

Well, so I was pinged about https://huggingface.co/Undi95/Llama-3-LewdPlay-8B-evo and supposedly it handles this a lot better, when it comes to sticking to concise responses in a roleplay chatting experience. So I guess that's worth trying.

saishf

May 2, 2024

Current Llama limitations then...?

Well, so I was pinged about https://huggingface.co/Undi95/Llama-3-LewdPlay-8B-evo and supposedly it handles this a lot better, when it comes to sticking to concise responses in a roleplay chatting experience. So I guess that's worth trying.

I've tried evo, I gave up on it in three responses. I'm really picky though. So it's still worth trying for others.

Lewdiculous

May 2, 2024

0.1.3 was initially a Model-Request, and I saw it scoring so high on the Chaiverse leaderboard that I didn't look at the other one. But I'll get to it later.

Lewdiculous

May 2, 2024

@saishf Is there any specific deal breaker about evo?

saishf

May 2, 2024

0.1.3 was initially a Model-Request, and I saw it scoring so high on the Chaiverse leaderboard that I didn't look at the other one. But I'll get to it later.

The latest epochs are scoring insanely high 😭

Varkoyote

May 2, 2024

They lead to a 404 page :c Idk how they get those haha.

saishf

May 2, 2024

@saishf Is there any specific deal breaker about evo?

Not that I found, i just disliked the way it wrote. And it struggled a little with my less common characters

Lewdiculous

May 2, 2024

Forget The Spice, now I need The Sauce!

saishf

May 2, 2024

They lead to a 404 page :c Idk how they get those haha.

Seems to be testing before release, I guess they just want to make sure the next release is the best possible

Varkoyote

May 2, 2024

•

edited May 2, 2024

(I have the same block-of-text issue with LewdPlay evo btw, maybe a L3 issue? TheSpice works fine without any instruct template even)

Lewdiculous

May 2, 2024

•

edited May 2, 2024

Llama-3 formatting never really jived with me because it had some weird formatting too often, but the repetition wasn't something I experienced but I need to test now with the new tokenizers. Meh, busy for a while til I can really sit with that.

TheSpice at least uses ChatML or a simple Alpaca or hells it has to be flexible then.

Varkoyote

May 2, 2024

•

edited May 2, 2024

Oh, does it? On the screenshot it does but on their description page it says it uses basic {{user}} and {{char}} instruct :o (and it works perfect like that too for some reason without even a stop sequence)

saishf

May 2, 2024

This- this makes me want to strangle whoever decided a new template was a good idea
Nothing will fix it 😿

saishf

May 2, 2024

Oh, does it? On the screenshot it does but on their description page it says it uses basic {{user}} and {{char}} instruct :o (and it works perfect like that too for some reason without even a stop sequence)

A good model will accept either chatml or alpaca, like seen with mistral and solar. It's just a matter of figuring out how to train it into llama3

Lewdiculous

May 2, 2024

•

edited May 2, 2024

This- this makes me want to strangle whoever decided a new template was a good idea
Nothing will fix it 😿

100% my feelings. This is the bane of my existence.

Lewdiculous

May 2, 2024

•

edited May 2, 2024

@saishf Okay hear me out! Maybe if we use HTML tags to format instead? Um? Um? It surely will not make these mistakes with HTML tags.

saishf

May 2, 2024

@saishf Okay hear me out! Maybe if we use HTML tags to format instead? Um? Um? It surely will not make these mistakes with HTML tags.

Python could work, it scores really high in python benchmarks

jeiku

Resplendent AI org May 2, 2024

I know I'm late but couldn't you find some way to use regex to fix examples like that in SillyTavern? Don't ask me how, I have to research regex rules every time I use it for anything...

Lewdiculous

May 2, 2024

•

edited May 2, 2024

@jeiku - I'm too stoopid.

@saishf I said HTML bcs ST supports HTML formatting nicely already just like Markdown so it should be plug and play and seems harder to misplace tags than simple asterisks.

saishf

May 2, 2024

I know I'm late but couldn't you find some way to use regex to fix examples like that in SillyTavern? Don't ask me how, I have to research regex rules every time I use it for anything...

Just ask llama 😼 if it knows what it is 🐥

saishf

May 2, 2024

@jeiku - I'm too stoopid.

@saishf I said HTML bcs ST supports HTML formatting nicely already just like Markdown so it should be plug and play and seems harder to misplace tags than simple asterisks.

Does it support leany text like wonky?

Lewdiculous

May 2, 2024

It supports "text-here" so I'm assuming it has to support italics...
I chatted with a character that used this format so it's possible.

saishf

May 2, 2024

It supports "text-here" so I'm assuming it has to support italics...
I chatted with a character that used this format so it's possible.

I don't wanna learn another language 😭
I'll leave it to smart people 😶‍🌫️
I just know llamas spit at people and that's what llama3 is doing 🥲

Varkoyote

May 2, 2024

I'm having token issues with Kcpp 1.64 too, so maybe it's the app's fault too :o

leomaxwell973

Jul 7, 2024

•

edited Jul 7, 2024

Oh wow, didn't realize this thread was active with this issue. I feel it is similar to my issue, if not the same. I used to have runaway prompts but after a bit, got it to not exceed 500 tokens too much, with or without num_predict or token_max. Usually. though, it would for about 200 tokens give or take repeat the same/similar ending phrase, and if not that, it would have very stiff vocabulary. I posted it's own thread for this, but i'll post it here since this seems to be the same issue to my surprise, I resolved it using this template.

{{ if .System }}<|start_header_id|>system
{{ .System }}<|end_header_id|>
{{ end }}{{ if .Prompt }}<|start_header_id|>user
{{ .Prompt }}<|end_header_id|>
{{ end }}<|start_header_id|>assistant
{{ .Response }}<|end_header_id|>

It's a combination of Llama3 and mistral, using mistrals layout, but Llamas headers instead of im start/end.

also my settings in Ollama:
{"mirostat":2,"mirostat_eta":0.25,"mirostat_tau":4.5,"num_ctx":8192,"num_predict":288,"repeat_penalty":1.35,"temperature":1,"top_k":100,"top_p":0.8}

I haven't fine tuned these post template fix yet, as they are mostly gradually adjusted from using the LLama template at this time. but the initial results make me not feel the need and not even know what to tune as it's been spot on so far. maybe a tad more temperature/tau/top_p is all I can think of.
do note the lack of "stop", i've given up on it as it doesn't "stop" so.. not sure how related that decision is lol.

here is a before and after, with a mostly default Llama3 to start (think i removed response and tail eot, think this was causing the runaway prompts), and my new template after

ResplendentAI
/

SOVL_Llama3_8B

Best so far :3