Cannot run batch on transformer

by DatenlaborBerlin - opened

model = exllama_set_max_input_length(model, 4096)

works in AutoGPTQ but not in transformer, where I get

'LlamaForCausalLM' object has no attribute 'quantize_config'

I am facing the same issue when using the 128g branch

Please open a Github Issue so @fxmarty can have a look at it
I have just added the issue on Github. Please look into it @fxmarty and @TheBloke

Sign up or log in to comment