Which device was benchmarking done?

#4
by spanspek - opened

In the README you mention the inference speed as

36.4 tok/s (wall-gen) / 36.8 tok/s (task-mean)

This would vary by device based on memory bandwidth across the different Apple Silicon devices

Which variant of Apple Silicon or device was used for benchmarking? (M4 Max, M3 Ultra, M5 Pro, etc)

baa.ai org

That one specifically was on an older M2 Pro, so certainly if you have the lasted M5 Pro you are going to do much better.

Thanks, M2 Pro has 200GB/s memory bandwidth

M5 Pro = 307GB/s
M5 Max = 614GB/s

Current king is still M3 Ultra = 819GB/s but M5 Ultra should come out later this year (M4 Ultra does not exist)

tomkay changed discussion status to closed

Sign up or log in to comment