Holy stuff this is huge!!!! Cant wait for 70b GGML model!!!!

#1
by rombodawg - opened

I cant believe meta just released these models! I am super exited for the 70b param quantization. And eventually (hopefully) we'll have a guanaco V2 😁😁

Yeah me too! FYI there won't be any Llama 2 70B GGMLs for a little bit, as it uses new modelling code which will need to be specially added to llama.cpp. I don't know ETA, but probably it'll be done quite quickly

Yeah me too! FYI there won't be any Llama 2 70B GGMLs for a little bit, as it uses new modelling code which will need to be specially added to llama.cpp. I don't know ETA, but probably it'll be done quite quickly

My biggest dream is that my 12GB GPU would handle 70B models. Sadge times indeed :(

I pulled this down and started testing immediately, and I'm blown away. It's comparing favorably with Airoboros-65B and ChatGPT-3.5 in my story writing tasks, and on my M2 Studio (60 GPU cores), and running inference at 25.89 tokens per second. The 4k context works fantastically as well

Honestly I'm more excited to see the release of the somewhere between 30b and 40b. For v1, even though 65b is better the sweet spot seemed to be around 30b/33b models, and that is also where you can find a lot of cool fine tunes.

Im running Airoboros 65B llama.cpp CPU Only, on a PowerEdge R620 maxed with a custom BabyAgi +Tool langchain implementation - and i prefer it to using chatgpt, no need for GPU, I feel llama2-13B is good but I would love to try a 70B GGML, but for now i would stick to Airoboros 65B

I tried the llama.cpp PR last night and unfortunately couldn't get it working. It is being worked on though, so hopefully soon.

@mambiux gauanco 65b is a better model than airoboros, by alot, use that one

@rombodawg I have tried guanaco, my steup uses agents and tools, python scripts and some clojure wichcraft, i have tried them all models, this have worked the best: Airoboros-65B, WizardLM-30B-Uncensored, 13b-chimera, superplatty-30b, Currently experimenting with all the flavors i can get my hands on of LLama-2 but so far results havent been great i feel it has heavy restrictions, and a really big bias for "As a responsible AI" it ensures to much of a "safe and respectful interaction"

Oh yea i use it in oobagooba where you can force uncensor it with the "start reply with" feature

I see, Let just say i want to stay away from the GUI as much as posible, all my setup is CLI , thanks ill check out the feature

Oh i gotcha, i personally love oobagooba, all the features and loaders are awesome. Especially making every single model uncensored no matter what is great

Sign up or log in to comment