GGML Quantizations for CPU Inference

ajibawa-2023 changed pull request status to merged

Thanks @Feanix for GGML quantz. Appreciate your work!

The file names MIGHT need to be changed but I needed GGML quantz for myself so I might as well share. I renamed the files on my computer but re-uploading 100+GB isn't really in my books right now, if that's okay? Oobabooga wasn't detecting that they were Llama-2 models because of how I named it. I could alternative upload a yaml (I think?) but that might just be confusing for those who would probably benefit most from access to this. Anywho, I'm on to testing. I'm really optimistic given my quick experience so far. Despite my goals and focus, I could do the other two Carl's and/or Scarlett if there is demand, however little. It's a script pointed at a folder so nbd.

Ok, I will release few more Scarlett models either today or tomorrow. You can do the GGML quant. I highly appreciate your efforts. Thank you!

Sign up or log in to comment