How to chat with model via API?

#33

by InsafQ - opened May 1, 2023

Discussion

InsafQ

May 1, 2023

How to chat with model via python API?

aszfcxcgszdx

May 1, 2023

You can't. These are delta weights, and you need to apply them to the actual LLama weights to get the OpenAssisant weights.

halilergul1

May 1, 2023

•

edited May 1, 2023

So when we apply them to the actual LLama weights, can we run it in local to get a kind of chat engine similar to web UI. Or is it totally another type of engineering stuff to do ?

aszfcxcgszdx

May 2, 2023

I think you can run it with transformers' automodel, as transformers supports LLama inference

tztsai

May 2, 2023

•

edited May 2, 2023

Try text generation web UI. It can load this model quite simply if you have about 70GB of GPU memory. It also allows you to load the weights in 8-bit quantization, which reduces the memory requirement approximately by half.

Kaxon

May 22, 2023

Try text generation web UI. It can load this model quite simply if you have about 70GB of GPU memory. It also allows you to load the weights in 8-bit quantization, which reduces the memory requirement approximately by half.

70GB of VRAM? What rig do you have?

tztsai

May 22, 2023

Try text generation web UI. It can load this model quite simply if you have about 70GB of GPU memory. It also allows you to load the weights in 8-bit quantization, which reduces the memory requirement approximately by half.

70GB of VRAM? What rig do you have?

My work station has 3 GPUs with 24 GB memory each.

CodeSoft

Jun 6, 2023

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment