Using transformers library to download the files

#2
by nicoleds - opened

Can the quantized models be downloaded using the transformers library? Tried the code as follows and it returned OSError: TheBloke/vicuna-13b-1.1-GGML does not appear to have a file named config.json.

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheBloke/vicuna-13b-1.1-GGML")
model = AutoModelForCausalLM.from_pretrained("TheBloke/vicuna-13b-1.1-GGML")

How do we specify which quantized model (i.e 4bit) to be used?

Thanks!

No, transformers can't handle GGML files in any way.

But ctransformers can, including downloading and loading individual GGML files: https://github.com/marella/ctransformers

That's very useful information! @TheBloke may I ask if it happens to know if ctransformers support chat completion? When I use it it just autocompletes sentences.

It can provide an OpenAI compatible API so yeah that should have a chat API mode. Not tried it myself

I have downloaded llama-2-7b-chat.ggmlv3.q4_0.bin and when I am trying to load the model using ctransformers I am getting error as GLIBC_2.29 compatibility and I am using RHEL which support GLIBC_2.17. Can I use any other model instead of GGML?

PS: I don't have much GPU

Sign up or log in to comment