Performance Evaluation

#52
by fatimakqq - opened

I am working with bloomz-7b1 for a project and my mentor asked me to ensure that my deployment is working with the same level of performance that it should. Right now, the version is running decently well, but not good enough for my project's purposes. I can't tell if that's because the model itself is too small or limited to get satisfactory responses or if it's an issue with my project itself and the server it's running on.

It's a pretty general question, but any pointers would be appreciated: how can I compare the performance of my project to the performance of bloomz-7b1 here? I see detailed evaluations were done on the performance here but I am not sure how to do that myself.

Thanks!

BigScience Workshop org

two options may be:

  1. Reproduce one of the evaluation scores e.g. XWinograd is cheap to run - You can check the scores in the paper: https://arxiv.org/abs/2211.01786 & instructions for running: https://github.com/bigscience-workshop/xmtf#evaluate-models
  2. If you can run bloomz, you can simply check if your generations are the same as some of the examples in the appendix of the paper: https://arxiv.org/abs/2211.01786 ; For the 7b1, it'll be a bit worse than those

Sign up or log in to comment