Running locally

#6
by mikeriess - opened

Can I load this for local inference on an RTX 4090 with 24 GB dedicated memory somehow?

OpenChat org

Currently we do not have a quantized model. However, you can use OpenChat to load on 2 RTX 4090s with vLLM tensor parallel.

Sign up or log in to comment