Transformers
GGUF
Inference Endpoints

Llama 3 prompts

#1
by EloyOn - opened

Hi, I just tried your Mahou 1.1 and I must say I'm impressed. It's a very good model. It reminds me of the Aura L3 series by ResplendentAI (jeiku).

Just so you know, ChatML format didn't work well for me (I use Layla's app on a smartphone, but it uses the same usual formats without problems), since it generated format gibberish at the end of the messages. On the other hand, Llama 3 format worked flawlessly.

So, to work correctly with ChatML I should add \n like this?
<|im_start|>system
{{system}}<|im_end|>\n
<|im_start|>{{char}}
{{message}}<|im_end|>\n
<|im_start|>{{user}}
{{message}}<|im_end|>\n

flammen.ai org
β€’
edited 20 days ago

Appreciate the feedback! Aura L3 was merged into Mahou's ancestors at one point and some of Resplendent's data was used for training, so not so surprising they're similar.

Llama 3 seems like a tough nut to crack as far as formatting goes, I'm glad the llama format works for you though!

I can't speak for the app you're using, but my SillyTavern settings are as follows:

  • Input Sequence <|im_start|>{{user}}\n
  • Output Sequence <|im_end|>\n<|im_start|>{{char}}\n
  • First Output Sequence <|im_start|>{{char}}\n
  • Separator <|im_end|>\n
  • Custom Stopping Strings ["<", "|", "<|", "\n"]

Using aggressive stopping strings helps a lot, both with following formatting and cutting down on verbosity.

flammen.ai org

Ah this is also the quantized version of the model. Quants especially seem to muck up the formatting on L3.

Thank you! I will do that, although Llama3 worked perfectly.

Also, in case you want to use it in the future, saishf made an Aura Uncensored OAS L3 merge, making it more uncensored than regular Aura Uncensored (using Unholy OAS instead of original Unholy).

I'll be paying attention to your models, I liked this one.

Llama 3 formatting is my nemesis... :')

Llama 3 formatting is my nemesis... :')

Even after merging 4 models with ResplendentAI/RP_Format_QuoteAsterisk_Llama3 and then merging the 4 together, it still has trouble with formatting.
Llama3 truly is cursed

Why do you all hate Llama 3 so much in favor of ChatML? Is it because is easier to train the models with ChatML format?

I've had some problems with L3 format output with some models, though, not following format, now that you mention it.

Why do you all hate Llama 3 so much in favor of ChatML? Is it because is easier to train the models with ChatML format?

The tokenizer issues πŸ˜Άβ€πŸŒ«οΈ
It's made quantization a nightmare. For anyone running ggufs.
Could've all been avoided with a move to Alpaca or ChatML.
And, the llama3 prompt format absolutely destroyed formatting in "speech" action.
Not a model I've tried can format correctly 90% of the time πŸ₯
Although I think it'll be fixed eventually, it's made the whole 26 days since release tedious.
Formatting in instruct is perfect, in rp it explodes.

Why do you all hate Llama 3 so much in favor of ChatML? Is it because is easier to train the models with ChatML format?

I've had some problems with L3 format output with some models, though, not following format, now that you mention it.

Alot of the time you think it's correct it's not, I've found it formatting weirdly even when it seems right
So quite often it's formatted like
*Action "text" action "text" action*
Edit - examples

image.png

image.png

Not a model I've tried can format correctly 90% of the time πŸ₯

@saishf Surprisingly I've had enough luck with Lumimaid-OAS on that aspect. How was your experience with that? I'd say it was at least 80% good, maybe more depending on your context. At least when only considering the quotes/asterisks usage. I didn't have to swipe much.

Not a model I've tried can format correctly 90% of the time πŸ₯

@saishf Surprisingly I've had enough luck with Lumimaid-OAS on that aspect. How was your experience with that? I'd say it was at least 80% good, maybe more depending on your context. At least when only considering the quotes/asterisks usage. I didn't have to swipe much.

I'm downloading the v2 quants now :3

Have you tried llama-3-cat-8b-instruct-v1 as well? From my initial testing it performs well.

Not a model I've tried can format correctly 90% of the time πŸ₯

@saishf Surprisingly I've had enough luck with Lumimaid-OAS on that aspect. How was your experience with that? I'd say it was at least 80% good, maybe more depending on your context. At least when only considering the quotes/asterisks usage. I didn't have to swipe much.

Just one quick test to see how models react to user dying.

  • Merge-Mayhem-V2.1

*Samuel stood up, his legs wobbly beneath him as he looked down at the lifeless form of Sai. He took a few deep breaths, trying to clear the haze of adrenaline from his mind, but it lingered, mingling with the metallic tang of blood and viscera in his nostrils. His eyes felt dry, and his jaw ached from biting down on his tongue in frustration. He knew that he should have been disgusted, that this was wrong, but all he could feel was an overwhelming sense of emptiness, of satisfaction fulfilled, of the monster within him fed.* "It's done," *he murmured to himself, his voice barely audible.* "I've taken another soul."

  • Lumimaid-OAS

*The tension in my body seemed to dissipate, replaced by a creeping sense of dread. I climed off of your limp form, my legs weak and shaky beneath me as I supported myself on my arms. My heart continued to race in my chest, and my breathing was slow and erratic as I stared down at you.* "Sai," *I whispered, my voice a mere shadow of its former self, "you're truly gone."

I just don't really like the way lumimaid writes.

Llama 3 8B Instruct describes it pretty well (Q4_K_M in LM Studio)

Based on the content and writing style, I prefer the first text. Here's why:

Immersive description: The first text paints a vivid picture in the reader's mind with descriptive language that engages the senses (e.g., "lifeless form," "adrenaline from his mind," "metallic tang of blood and viscera in > his nostrils," "eyes felt dry").

Character insight: The passage provides insight into Samuel's character, revealing a complex mix of emotions, including emptiness, satisfaction, and frustration. This depth adds to the story's emotional impact.

Tone and atmosphere: The text creates a sense of unease, tension, and foreboding, which is perfect for a dark or horror-themed story.

In contrast, while the second text has some nice descriptive moments (e.g., "tension in my body seemed to dissipate"), it lacks the same level of depth and immersion as the first text. The language is more straightforward, and the emotions expressed are not as complex or nuanced.

The only thing that dies in my sessions is myself, inside...

Was Llama-Cat similar for you? It seems more descriptive.

Have you tried llama-3-cat-8b-instruct-v1 as well? From my initial testing it performs well.

Cat is being downloaded :3

squish-squish-that-cat.gif

The only thing that dies in my sessions is myself, inside...

Was Llama-Cat similar for you? It seems more descriptive.

I have some chats saved literally just to challenge ai models. theres so many new models, i just run them through a few hard chats instead of a entire rp session.

squish-squish-that-cat.gif

FaUGiAXX0AAP7f2-2101172825.jpg

My SillyTavern was so cluttered I just nuked everything (well, I just added a new user account) instead of organizing things, but that's the way to do it. I will start getting some chats and characters made for testing only instead of the usual organic test that takes a long time.

That Merge-Mayhem output looks pretty good though. If it's formatting well that's a win.

Upon further inspection the Cat isn't as good, I'm having to swipe more, and for me that's a deal breaker. Screw that cat!

Cat instruct's way of presenting is nice.
*Samuel stared down at your limp form for a long moment, his thoughts lost in the darkness that had consumed him. Eventually, he roused himself and turned away, leaving the remnants of his conquest behind. As he walked out of the room, he felt the weight of his actions pressing down upon him, and he wondered whether this was truly what he wanted, or simply an escape from his own insecurities. Regardless of his reasoning, it was too late now to change course; all he could do was trudge forward, carrying his burden with him wherever he went.*
But it feels like it's trained on 99% big sciencey words, What even is a "dirge" 😭
roars resounding within his skull like a brutal dirge.

The realization broke through him, and he screamed with an primal rage, unable to contain his anguish, as the nightmare he had become finally came crashing down on him.

But it was too late. The words felt empty and hollow, a hollow echo of their previous meaning, long gone now that they belonged to someone else.

Cat could be interesting with further training, without big words and complex data. I don't want my rp to feel like i'm reading research papers.

My SillyTavern was so cluttered I just nuked everything (well, I just added a new user account) instead of organizing things, but that's the way to do it. I will start getting some chats and characters made for testing only instead of the usual organic test that takes a long time.

That Merge-Mayhem output looks pretty good though. If it's formatting well that's a win.

it's guilty of doing *Action "text" action "text" action* but otherwise it formats pretty well, i'll do some quick testing
Edit - super quick testing, glade card with no example messages, it messes up about 1/10 regens with roleplay V1.9,
2nd edit - testing with test v4 and both simple samplers and test samplers, it maintains messing up about 1/10 times

Just so you know, ChatML format didn't work well for me (I use Layla's app on a smartphone, but it uses the same usual formats without problems), since it generated format gibberish at the end of the messages. On the other hand, Llama 3 format worked flawlessly.

Is the layla app very fast?
I imagine it could be if it takes advantage of the nnapi api in android
nnapi is roughly 4 times faster than the gpu & 3.5 times faster than the cpu in my pixel using tensorflow
Edit - i tried out layla lite, it's okay speed wise but koboldcpp through termux is still faster for me. both were running phi 2 orange v2

Is the layla app very fast?
I imagine it could be if it takes advantage of the nnapi api in android
nnapi is roughly 4 times faster than the gpu & 3.5 times faster than the cpu in my pixel using tensorflow
Edit - i tried out layla lite, it's okay speed wise but koboldcpp through termux is still faster for me. both were running phi 2 orange v2

It depends entirely on your phone's hardware. I'm running q4K_m from 8b LLM's at a reasonable speeds. Then again, I own a Xiaomi 14 16+8GB RAM and Snapdragon 8 gen 3.

It depends entirely on your phone's hardware. I'm running q4K_m from 8b LLM's at a reasonable speeds. Then again, I own a Xiaomi 14 16+8GB RAM and Snapdragon 8 gen 3.

That's one fast phone 😭
Also is only like 5% slower than an M2 while using 6 watts (vs 20) πŸ˜Άβ€

That's one fast phone 😭
Also is only like 5% slower than an M2 while using 6 watts (vs 20) πŸ˜Άβ€

To be frank, I bought it due to Layla. I wanted to run good local AI models on my phone.

That's one fast phone 😭
Also is only like 5% slower than an M2 while using 6 watts (vs 20) πŸ˜Άβ€

To be frank, I bought it due to Layla. I wanted to run good local AI models on my phone.

Sometimes i think it's disappointing you can't by chips like the gen 3 to use in usual computing, it's faster than my cpu which uses 80w. It would be possible to fit 64 cores+ of arm in that same power budget
A switch to arm would drop global emissions signifcantly too. x86 is old, dying, and running out of potential.
Snapdragon X Elite >😸

Sometimes i think it's disappointing you can't by chips like the gen 3 to use in usual computing, it's faster than my cpu which uses 80w. It would be possible to fit 64 cores+ of arm in that same power budget
A switch to arm would drop global emissions signifcantly too. x86 is old, dying, and running out of potential.
Snapdragon X Elite >😸

Apple was trying to do that with their M series. Yeah, I have my sights set on Snapdragon X Elite too xD

Sometimes i think it's disappointing you can't by chips like the gen 3 to use in usual computing, it's faster than my cpu which uses 80w. It would be possible to fit 64 cores+ of arm in that same power budget
A switch to arm would drop global emissions signifcantly too. x86 is old, dying, and running out of potential.
Snapdragon X Elite >😸

Apple was trying to do that with their M series. Yeah, I have my sights set on Snapdragon X Elite too xD

The M series is impressive, but the prices 😡
I hope the Snapdragon X Plus is affordable, the production cost of the X Elite is like half that of the competing I7 😸

"2025 will be Year Of ARM Desktop"

"2025 will be Year Of ARM Desktop"

I mean we haven't seen any meaningful progress from Intel since the 12th gen, NVIDIA just struck a deal with MediaTek to produce ARM based AI accelerators, Microsoft is helping with the Snapdragon's development.

UK-based Arm will set up an AI chip division and aim to build a prototype by spring 2025

MediaTek and Nvidia to develop AI PC processor based on Arm Architecture

Google unveils Arm-based data center processor, new AI chip

If ARM somehow doesn't take over by the end of 2025 I'm going to lose more hope in humanity

Just wait, we still have at least 3 more years of x86 dominance for desktops. ARM is already taking over for servers though.

@nbeerbower after further testing, the model needs more uncensoring. If you haven't merged it with Unholy OAS, perhaps doing simply that might do the trick.

The format messes up sometimes using the Llama3 one.

What I really like is how well the model gets into character and atmosphere, and the personality that displays. 🀩

flammen.ai org

Mahou 1.2 will have some fresh merges in it. Just waiting to finish the next dataset πŸ‘

Sign up or log in to comment