Holy stuff this is huge!!!! Cant wait for 70b GGML model!!!!

by rombodawg - opened Jul 18, 2023

Jul 18, 2023

I cant believe meta just released these models! I am super exited for the 70b param quantization. And eventually (hopefully) we'll have a guanaco V2 😁😁

TheBloke

Owner Jul 18, 2023

Yeah me too! FYI there won't be any Llama 2 70B GGMLs for a little bit, as it uses new modelling code which will need to be specially added to llama.cpp. I don't know ETA, but probably it'll be done quite quickly

Wubbbi

Jul 18, 2023

Yeah me too! FYI there won't be any Llama 2 70B GGMLs for a little bit, as it uses new modelling code which will need to be specially added to llama.cpp. I don't know ETA, but probably it'll be done quite quickly

My biggest dream is that my 12GB GPU would handle 70B models. Sadge times indeed :(

ericskiff

Jul 18, 2023

•

edited Jul 18, 2023

I pulled this down and started testing immediately, and I'm blown away. It's comparing favorably with Airoboros-65B and ChatGPT-3.5 in my story writing tasks, and on my M2 Studio (60 GPU cores), and running inference at 25.89 tokens per second. The 4k context works fantastically as well

7erminalVelociraptor

Jul 19, 2023

Honestly I'm more excited to see the release of the somewhere between 30b and 40b. For v1, even though 65b is better the sweet spot seemed to be around 30b/33b models, and that is also where you can find a lot of cool fine tunes.

mambiux

Jul 21, 2023

•

edited Jul 21, 2023

Im running Airoboros 65B llama.cpp CPU Only, on a PowerEdge R620 maxed with a custom BabyAgi +Tool langchain implementation - and i prefer it to using chatgpt, no need for GPU, I feel llama2-13B is good but I would love to try a 70B GGML, but for now i would stick to Airoboros 65B

TheBloke

Owner Jul 22, 2023

I tried the llama.cpp PR last night and unfortunately couldn't get it working. It is being worked on though, so hopefully soon.

rombodawg

Jul 22, 2023

@mambiux gauanco 65b is a better model than airoboros, by alot, use that one

mambiux

Jul 24, 2023

@rombodawg I have tried guanaco, my steup uses agents and tools, python scripts and some clojure wichcraft, i have tried them all models, this have worked the best: Airoboros-65B, WizardLM-30B-Uncensored, 13b-chimera, superplatty-30b, Currently experimenting with all the flavors i can get my hands on of LLama-2 but so far results havent been great i feel it has heavy restrictions, and a really big bias for "As a responsible AI" it ensures to much of a "safe and respectful interaction"

rombodawg

Jul 24, 2023

Oh yea i use it in oobagooba where you can force uncensor it with the "start reply with" feature

mambiux

Jul 24, 2023

I see, Let just say i want to stay away from the GUI as much as posible, all my setup is CLI , thanks ill check out the feature

rombodawg

Jul 24, 2023

Oh i gotcha, i personally love oobagooba, all the features and loaders are awesome. Especially making every single model uncensored no matter what is great

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment