Edit model card

QuantFactory/Llama-3-8B-Stroganoff-GGUF

This is quantized version of HiroseKoichi/Llama-3-8B-Stroganoff created using llama.cpp

Original Model Card

Llama-3-8B-Stroganoff

This is a merge of several Llama-3-8B roleplay models. Everyone has a different preference for roleplay, so it's important to clarify my intentions so that people know what to expect from this model. First and foremost, I value coherency above all else. If the model makes blatantly illogical or contradictory statements or doesn't understand what's going on, then I have to write part of a response for them or tweak my own, and even then the model may or may not understand what's going on, and it just kills the mood because then I have to prompt engineer instead of enjoy myself.

The other thing I value is appropriate response lengths. Many models will give excessively long responses that don't add any extra value. The most common form of this I see is what I call over-responding, where the model essentially gives 2-3 responses in the same message, often very similar to each other. My ideal model is one that generally keeps responses short (10 or fewer sentences) but will provide a lengthy response when appropriate, assuming it effectively utilizes that extra length.

One thing that I'm very proud of about this model is that I've yet to encounter any repetition traps. I haven't done extensive long chat testing, but I did test on a few that frequently give me trouble, and it didn't get stuck in a loop. Another thing I really like about this model is that characters actively do things; I've had plenty of times where they suggest things on their own or take the initiative. I haven't had the model speak for me during normal usage; it can still happen, but it only has when I give a very open-ended one-sentence response. Just make sure your character card, example messages, and first message don't have the model speak for you, and you shouldn't encounter it either.

As for formatting, like every other roleplay model, the *action* + "quote" format is hit or miss. I never really liked it and think that trying to fight the model to enforce it is a waste of time when it clearly won't stick to it naturally without excessive effort. I exclusively use plaintext + "quote" format; that way, I can actually use italics like they were meant to be used. If you just want different colored text, then SillyTavern has an option for that. All of my testing is done with the SillyTavern preset Llama-3-Instruct-Names, which I made, but with the default system prompt removed.

I'm pretty happy with the results of this model, and it sets a good foundation for me to work with. Future versions will try to increase creativity and reduce cliches, but as I said, coherency is first and foremost for me.

Quantization Formats

GGUF

Details

Models Used

Merge Config

models:
    - model: Sao10K/L3-8B-Stheno-v3.2
    - model: Nitral-AI/Hathor_Tahsin-L3-8B-v0.85
    - model: HiroseKoichi/L3-8B-Lunar-Stheno
    - model: maldv/badger-writer-llama-3-8b
merge_method: model_stock
base_model: NousResearch/Meta-Llama-3-8B-Instruct
dtype: bfloat16
Downloads last month
149
GGUF
Model size
157M params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .