I have no idea what you did, but this model's great.

#1
by CamiloMM - opened

Been using the Q4_K_M of this (by @LoneStriker ) and it's pretty awesome. And runs fast enough on a 24GB card + RAM, unlike say a 70B L2 model. And the context length is just sweet.

MinP with DynaTemp, ChatML, running on Kobold via ST. I get around ~10 T/s.

I'm constantly blown away by the ever-increasing progress of locally-runnable AI, and it's because of people like you that keep pushing the envelope with these finetunes. Can you imagine if we had Stable Diffusion, but no finetunes or LoRAs? LLaMA2-Chat, all by itself? The AI landscape would look bleak and gray.

Glad you enjoyed it!

Been using the Q4_K_M of this (by @LoneStriker ) and it's pretty awesome. And runs fast enough on a 24GB card + RAM, unlike say a 70B L2 model. And the context length is just sweet.

@CamiloMM , Could you please give us more information about the ability of this model to keep up at longer inputs? Did you notice any obvious difference between this model and other models in handling long inputs?

@CamiloMM , Could you please give us more information about the ability of this model to keep up at longer inputs?

I only test models up to 8K tokens of input + smart context in Kobold (I think this means it incurs the cost of 16K tokens internally, but not sure if I got that right), so I can't be sure about that. It does hold up at that range, though (Llama2 with rope scaling usually started being nonsense at some point) It should be capable of 32K tokens, have you had issues?

I have not had the opportunity to try this model yet, but your comment has piqued my interest. My aim is to utilize language models to revise extensive content, specifically articles spanning around 2,500 words. Based on your experience, do you believe this model can effectively process and edit such lengthy inputs (approximately 8K tokens) while preserving crucial information? Although I have experimented so many models, none of them couldn't help me. I am curious to see if this model can accomplish this task. Essentially, I would like to provide the model with a long input and request that it expands and enhances the content.

@HR1777 based on your requirements, I'd actually recommend Nous-Hermes-2-Mixtral-8x7B-DPO which you can try out for free on HuggingChat to see if it is smart in the stuff you wanna ask him.

The biggest feature of Crunchy-Onion for me, was that it's absolutely uncensored and fun. It'll take a character card in SillyTavern and roll with it, no questions asked. But for more serious tasks your best bet is probably similar to Nous Hermes. Do also check out Dolphin and Bagel.

These will all have 32K tokens of context as they are 8x7B, and they will be pretty smart, also mostly uncensored but without any bias for trying to be fun (Crunchy Onion is more RP-oriented).

8x7B is very fast by itself, but I recommend also trying exl2 versions of these if you can. Those, will be crazy fast. On a 24GB card, you can run the 3bpw no problem, maybe 3.5bpw.

@CamiloMM , Thank you so much for your time and great suggestions. I am quite confident that the models you introduced are better suited for my requirements than the previously tested ones, as I have come across positive reviews about them on other websites as well. By the way, if you are searching for a completely uncensored model, you may want to try this one: https://huggingface.co/Undi95/Unholy-v2-13B.

@CamiloMM , Thank you so much for your time and great suggestions. I am quite confident that the models you introduced are better suited for my requirements than the previously tested ones, as I have come across positive reviews about them on other websites as well. By the way, if you are searching for a completely uncensored model, you may want to try this one: https://huggingface.co/Undi95/Unholy-v2-13B.

Glad to help! In the same vein/size of Unholy 13B, I'd recommend PsyMedRP 20B (also available in 13B), also made by Undi95. It has MedLLaMA_13B in the mix which helps it understand some themes better (anatomy, psychology, etc). That said, if you can run bigger models, it's even better!

Sign up or log in to comment