Hello again. I wonder what backend you use to serve this model. I've looked at vLLM, llama.cpp but they don't support tensor parallelism for GGUF at the moment.
· Sign up or log in to comment