Exllama

#1
by ndurkee - opened

I just wanted to confirm that this works with Exllamav1. I can't comment on v2 at the moment.

Great, thanks for letting us know!

It works with Exllama v2 (release: 0.0.4).

c:\AI\exllamav2>call .\venv\Scripts\activate   & python examples/chat.py --mode raw --model_dir c:\AI\exllamav2\models\Mistral-7B-Instruct-v0.1-GPTQ-4bit-32g-actorder_True
 -- Model: c:\AI\exllamav2\models\Mistral-7B-Instruct-v0.1-GPTQ-4bit-32g-actorder_True
 -- Options: ['rope_scale 1.0', 'rope_alpha 1.0']
 -- Loading model...
 -- Loading tokenizer...

User: Hi

Chatbort: Hello! How can I help you today?
This comment has been hidden

Are you finding it slower in exllama v2 than in exllama? I do.

Sign up or log in to comment