allenai/OLMo-7B · Will there be quantized versions? (GGUF)

alexcardo

Feb 2, 2024

Can this model be quantized and converted to the GGUF formate to use it with llama.cpp?

Kre8tiveAi

Feb 2, 2024

Did you check visual STudio for ai extention/2convert ?

natolambert

Ai2 org Feb 2, 2024

@TheBlocki plz thx
we'll work on more code integrations if anything specific is wrong.

jbkcrash

Feb 2, 2024

I have been trying to hack it to work this morning. I added the new arch "OlmoModelForCausalLM," but I'm not sure if there is an existing compatible one like MODEL_ARCH.LLAMA.

As a likely result, I am running into deeper model issues with llama.cpp, for example.

Loading model: OLMo-7B
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
The repository for /backup_disks/OLMo-7B contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//backup_disks/OLMo-7B.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
gguf: Adding 50009 merge(s).
gguf: Setting special token type eos to 50279
gguf: Setting special token type pad to 1
Exporting model to 'olmo.gguf'
gguf: loading model part 'pytorch_model.bin'
Can not map tensor 'model.transformer.wte.weight'

Transformers needs to be updated to the latest version from Github, but ai2-olmo seems to need a version of the torch that is hard to resolve. I will give it one last attempt to try with this version torch-2.3.0a0+git52b679d. But I fear a proper arch needs to be added to llama.ccp, and all my attempts are to no avail. In that regard, I am trying to use the llama.cpp convert_hf_to_gguf.py, just too early, I think, at this point.