Time taken is too long

by anujchopra - opened

This model has same architecture and number of parameters as OpenChat 3.5 0106. But it takes much longer ( and more computations ). Can anyone help me understand why?
When I compare the time taken to generate n tokens for this model and Openchat, the difference is 10 times. OpenChat is 10 times faster than this.

Sign up or log in to comment