how to serve the model with parallelism

#13

by lone17 - opened Aug 15, 2024

Aug 15, 2024

Hello again. I wonder what backend you use to serve this model. I've looked at vLLM, llama.cpp but they don't support tensor parallelism for GGUF at the moment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment