Inference Speed Benchmark and GPU memeory usage
#8
by
Yunxz
- opened
We tested the GPU memory usage and inference speed of the QwQ-32B-Preview model using the transformer and vLLM with EvalScope's speed benchmark tool. See Document
Reference:
- EvalScope open-source address
- Speed Benchmark tool usage instructions
Yunxz
changed discussion title from
Inference Speed Benchmark
to Inference Speed Benchmark and GPU memeory usage