How do I reproduce the coherency locally?

#85
by Henk717 - opened

I am trying to reproduce the length and quality of the responses locally but do not get anywhere close with this model.
So i'd like to know what kind of things are being done behind the scenes in order to get this level of quality from the reply.

  1. Which settings are being used for inference? Things like temperature, repetition penalty, etc.
  2. Is any special steering or prompt injection being done to increase the output quality of the model?

In the interest of the open ecosystem i'd also like to suggest a feature that HuggingChat has an option to show these things to the user for other developers to learn from since I understood the plan is to expand to different models.

Sign up or log in to comment