Whether model is prepared for 16 bit also?

by SergeyOvchinnikov - opened Dec 13, 2023

Dec 13, 2023

Hello Ilya!
First of all, thank you very much for the model you have cretaed. Great job!

Please advise whether your model is prepared to run under 16bit (i.e. without flag load_in_8bit=true)?
I can see a strange behavior that under 8bit model inference reletavely slow (i.e. 4-5 tokens per sec even on A100 GPU), but text responses quality is good.
If I switch 8bit mode OFF, i.e. turn 16bit mode ON it works much faster (i.e. 15-20 tokens per sec on A100) but text response quality much lower and responses are shorter than with 8bit mode.
My promts are in Russian.
I wonder whether your LORA layer is only for 8bits and this layer takes extra time during inference?
Do you see a configuration to run the model with good quality responses (as with 8bit) but fast as in 16bit? Considering that I have hardware enough.
Thank you!

IlyaGusev

Owner Dec 28, 2023

Of course, the base model's precision is float16. You can always merge adapters into it.
There should be almost no difference between 8 and 16 bits.

SergeyOvchinnikov

Jan 23

•

edited Jan 23

Hello again!
Please advise whether Your model is prepared to run with native 32-bit mode?
Is so, whould you be so kind to give an easy sample of code of "from_pretrained" with parameters how I can run the model in this mode?
Thanks in advance!

IlyaGusev

Owner Feb 1

SergeyOvchinnikov

Feb 9

Thank you for the answer!
Please provide links to the sources of these screenshorts.

BNB is bitsandbytes or something esle?

IlyaGusev

Owner Mar 2

https://arxiv.org/abs/2208.07339

Yes, it is bitsandbytes

tur0kmag

17 days ago

Добрый день.
Планируем использовать модель в качестве чат-бота (архитектура RAG).
Подскажите как добиться, чтобы модель отвечала на вопрос, основываясь только на переданном контексте?
пробовал разные варианты промптов, но всегда модель оперирует в том числе и теми данными (знаниями), на которых она обучалась.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment