Inference Speed Benchmark and GPU memeory usage

#8
by Yunxz - opened

We tested the GPU memory usage and inference speed of the QwQ-32B-Preview model using the transformer and vLLM with EvalScope's speed benchmark tool. See Document

Reference:

Yunxz changed discussion title from Inference Speed Benchmark to Inference Speed Benchmark and GPU memeory usage

Sign up or log in to comment