Inference Speed Benchmark and GPU memeory usage

by Yunxz - opened 29 days ago

29 days ago

•

We tested the GPU memory usage and inference speed of the QwQ-32B-Preview model using the transformer and vLLM with EvalScope's speed benchmark tool. See Document

Reference:

EvalScope open-source address
Speed Benchmark tool usage instructions

Yunxz changed discussion title from Inference Speed Benchmark to Inference Speed Benchmark and GPU memeory usage 29 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment