https://huggingface.co/v000000/DupletHelozardLM-7B-t0.0001

#141
by v000000 - opened
This comment has been hidden
v000000 changed discussion status to closed

Now that model had an intriguing name...

Yeah Sorry to bother you with the notification and deletion!
I tried recreating something like Alchemonauts NearSwap merge algorithm. I did WizardLM2+Holodeck+Erebus as a test but it seemed to be working really well and asked for ggufs. But when I tested more general, I noticed I messed up the algorithm somewhere I think? and it completely over-fitted on Holodeck so I'm re-running the merge later properly, hopefully.

No problem from my side - I was quite excited to see the name, and very excited to see somebody experimenting with that algorithm (being a major fan of QuartetAnemoi). I always had the impression it was probably just a fluke, but would love to be wrong. So thanks for experimenting with it, I'd be more than eager to try out any results :)

Yep most likely a fluke! But I really wanted something more manageable that was similar. I fixed the algorithm to work like the original now so it's less of a mess, and well I don't know if they are any good tbh, but they work better at least. Here they are if you wanna try them (Sequential WizardLM2+Frostwind1.2+Erebus-Holodeck): https://huggingface.co/v000000/TripletBoreas-7B-t0.0001 and (WizardLM2+Erebus-Holodeck): https://huggingface.co/v000000/DupletBoreas-7B-t0.0001

Sure I want to try them. Let's quant them asap :)

Soo....

  1. Frostwind is gone wtf? Why would... (i wish somebody had made a copy of stellarbright and a few others)
  2. I just realised that I have no useful experience with 7b models, so I can't assess the quality of these models even subjectively :()

(currently trying out triplet, because triplet sounds more than duplet, so it must be better).

So, knowing that I can't asses a 7b reasonably, I must say, I have some end token issues, but I get the same vibes as from QuartetAnemoi, i.e. TripletBoreas seems very good at instruction following, much better than my recent experiences with llama3 70b models. And no obvious failures other than some repetition issues.

I would hope somebody with more experience with 7b's (and the models they are based on) would give these a try - they are definitely not failures.

So, knowing that I can't asses a 7b reasonably, I must say, I have some end token issues, but I get the same vibes as from QuartetAnemoi, i.e. TripletBoreas seems very good at instruction following, much better than my recent experiences with llama3 70b models. And no obvious failures other than some repetition issues.

I would hope somebody with more experience with 7b's (and the models they are based on) would give these a try - they are definitely not failures.

This is great news, indeed. πŸ™πŸ‘πŸ’₯

Haha thank you, that's very high praise! And yeaaa it's sad, I saw that Sao10K removed all his models so I re-uploaded Frostwind, I'm guessing he is just moving them to another account or organizing a bit and they will return? I hope at least. I also don't have much experience with Mistral models outside of Solar ones, so I don't know if they are good either.

But I've been using your i1-Q4_K_S on my phone for creative writing prompts, I'd say Triplet is better. And it has moments where I'm really impressed, but also moments where it lacked engagement and didn't stick to my request, limitations of 7B i suppose. I also ran into stop token issues (Probably because of Holodeck). I've been comparing it to Midnight-Miqu, Nimbus-Miqu and QuartetAnemoi (at a brutal quant) and they are definitely superior as to be expected, But I agree it has a similar vibe to Nimbus and Quartet though! So that's impressive.

I also did two bigger 15B Llama3 NearSwap but I haven't tested much yet but they seem improved from the two base models at least, but L3 models use the same ingrained cliches and tropes. Anyway NearSwap is a great algorithm for softly infusing models into a base model somewhat like a better Task Arithmetic, I think it should be added in mergekit.

My impression also was that Triplet is better. And, wow, I didn't realise sao10k deleted so many models - but not all of them. That doesn't feel like just moving them somewhere else. Well, let's hope.

As for nearswap, it would be absolutely marvellous if nearswap were not a fluke, even if it wouldn't work with all models. You should definitely push it to mergekit :)

Sign up or log in to comment