Landmark Qlora Compatibility

#10
by theman23290 - opened

I am using Oobabooga to load this model along with the landmark Qlora and I am running into a "llamacppmodel object has no attribute dtype" error. I am wondering if anyone else is crazy enough to pair the ggml and qlora together like myself and make it work. I think it is because it wants the float16 version because the qlora was trained on float16 or if it is just impossible. Anyone get it to work or am I glossing over something? I have trust remote code on, context and truncate to 8192, and disable add the bos_token. If I force the generation I run into context overflow errors and crash.

System Configuration:
OS: Debian 11
CPU: Xeon E5-2670v3 (25C Virtualized)
RAM: 80GB
GPU: None

I'm afraid you're glossing over a very big something. You cannot apply Landmark Attention to a GGML. For there to be landmark attention support in GGML, it would have to be implemented directly in llama.cpp / llama-cpp-python, in C++.

The reason we're able to apply Landmark Attention to GPTQ and fp16 models is because they provide custom Python code which gets executed by the transformers library, triggered by the trust_remote_code=True argument. This custom code tells it how to implement landmark attention.

None of this is possible with GGML. There is no ability to execute custom code, and there is no code written for landmark attention with GGML anyway.

There is work being done on extended context in llama, though not Landmark Attention. Rather they are looking into the ROPE method. If and when they implement that in Llama.cpp, you'd be able to load a ROPE model with increased context and it would work.

So keep an eye on llama.cpp for developments in this area. But in the meantime you won't be able to apply them yourself.

Rip, I was hopeful. CPU only deployed to a rack has its strengths and weaknesses. I am hopeful for Oobabooga to adapt MPT 30b ggml into the service. I would just like to be broken free from 2048 jail lol.

Sign up or log in to comment