Running locally

by mikeriess - opened Jul 8, 2023

Jul 8, 2023

Can I load this for local inference on an RTX 4090 with 24 GB dedicated memory somehow?

imone

OpenChat org Jul 9, 2023

Currently we do not have a quantized model. However, you can use OpenChat to load on 2 RTX 4090s with vLLM tensor parallel.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment