can inference with vllm?

#21
by amosxy - opened

mistral_inference speed is slow when inference with h800

prefix = """def add("""
suffix = """ return sum"""

it need 4 second

Sign up or log in to comment