How to use the gguf files?

#7
by ameasure - opened

Transformers does not appear to support gguf for the T5 architecture, and llama.cpp is claiming unknown model architecture ''. How does one use these? Or is it only supported with candle?

The problem seems to be that llama.cpp gives that error for T5 models in GGUF V2 format. I'm not sure if it's worth even opening a bug report since every one about MAD400 gets closed for "lack of interest" :/ . I'm just glad it's even possible at all. So you'll need to either convert it and/or quantize it yourself (https://huggingface.co/spaces/ggml-org/gguf-my-repo), or find a newer existing conversion using the V3 format, like this: https://huggingface.co/notjjustnumbers/madlad400-3b-mt-Q4_K_M-GGUF

This may or may not be the same issue with transformers. I don't know, I haven't tried. I also don't know of any way to tell what version of GGUF a model uses except by looking at the date it was created, downloading it, crossing my fingers, and trying it, which is unfortunate

Thank you!

download "LM Studio"

old gguf files will need to be opened by an older version of lllama_cpp or the new version of llama_Cpp can convert them to the new format as they have a command for this ...
soall the old GPT4all GGML can be converted to thier newer version ( i still have the earlier versions of gpt4all so they can still run here ))

Well, older versions of llama.cpp don't support T5, so that doesn't work, and I was unable to get the latest llama.cpp to convert them since the converter gave the same error as when trying to run them. Is there some secret?

Sign up or log in to comment