The model is very nice

#2
by ABX-AI - opened

However, I keep encountering an issue, and I've basically seen it with all models - where {{char}} gets into repetitive style of wording, on multiple responses "welcoming" the user to their (lair/domain/world/whatever) and continuously asking the user "are you ready to...?"

Is this something that 7Bs simply suffer from (or maybe a mistral thing), or a problem with character cards? I've been trying to get out of that with the merges I'm doing but it's not working out as well as I hoped for q_q

I think it's mostly a dataset thing, and a product of possible model inbreeding with same/similar data.

I think it's mostly a dataset thing, and a product of possible model inbreeding with same/similar data.

Yeah, I think so, I get far less of that with the llama ones I'm using now, I think mistrals all kind of have this problem, maybe due to sharing the same dataset or maybe due to the architecture, not sure. I'm getting this even on mixtral

Yea, seems more like a training data issue overall. Whether that be induced at pre-training or during fine-tuning - most mistral/llama2/mixtral/qwen models feel the same to me at this point (relatively speaking).

Yea, seems more like a training data issue overall. Whether that be induced at pre-training or during fine-tuning - most mistral/llama2/mixtral/qwen models feel the same to me at this point (relatively speaking).

I've been having a different experience with the solar-based 11Bs lately, they seem to just go for stuff a lot more, whereas on the 7Bs I literally get 5 responses in a row each ending with "Welcome to my...", "Are you ready to...?" 😞

I'm moving into merging 11B solars now xd

Personally, I've had a worse experience with solar models myself (I've tried plenty of them trust). And the issue directly above seems more like a char/card issue or sampler settings. But to be honest with you, im bored of the space in general. Someone wake me up when sd3, llama 3 and bitnet fully drops.

Personally, I've had a worse experience with solar models myself (I've tried plenty of them trust). And the issue directly above seems more like a char/card issue or sampler settings. But to be honest with you, im bored of the space in general. Someone wake me up when sd3, llama 3 and bitnet fully drops.

I've been using the exact same cards which is why I noted that, maybe my merge is the best possible solar imaginable :D :D jk

I hear you. When IS llama3 dropping anyhow???? I've been hyped for that for too long. I'd like 1.5Bit era + llama3 to drop, please (sd3 too ofc ofc)

Sign up or log in to comment