Is my understanding correct that the monkey patch will be needed to be added for inference only?

#1
by flashvenom - opened

ie. when I convert this model into GGML/GPTQ, I will need to make sure the inference engine is using this patch logic right?

Correct

Have you looked into applying this as a config.json patch activated with trust_remote_code=True, like how Landmark Attention is applied eg at https://huggingface.co/eugenepentland/Minotaur-13b-Landmark ?

Then Transformers could auto load rather than needing manual editing of inference code. That could make it a lot more accessible, if it's possible?

@TheBloke I will look into it and convert tomorrow if that is ok

That'd be wonderful! I think that will really help to get people using your model. I will provide a quantised GPTQ once that is done, and publicise your work.

Thank you

@TheBloke Since @emozilla has already added the code for trust_remote_code, can you take it from there? https://huggingface.co/emozilla/open_llama_7b-scaled
Since this is a LoRA I don't think it would benefit to put the code here, no? Only in the final merged model repository

Sign up or log in to comment