Feedback: Model new token generation configuration ability.

#337
by dyoung - opened

Hello,

I've been playing with Mixtral from this chat UI. Recently I was testing Mixtral to see how it would fair as using it as a study aid tool by asking it to make multiple questions based of a block of text. It does pretty well overall, but I've noticed that 25% of it's output at the end of each generation/inference round will fabricate important parts. And I found that regardless of heavily prompting it to not make up source text quotes, it still consistently makes up things. I frequently made up facts or text that was not in the source block of text. Even if the source block was in the chat just right before the output.
I want to point out that at home, I use stand alone quantized 7B models that are more accurate in task following of this nature. And I highly suspect that it's likely generation configurations for inference that are set as a default middle for the chat UI demo that is likely causing this behavior for Mixtral chat experience. (Makes sense in wanting to demo a model that sits in the middle of accurate and creative.) It also seems to get worse the long the chat conversation goes.
I'd figure it wouldn't hurt to ask so I propose that it would be nice to be able to adjust the temp, top-k, top-p, repetition penalty, and stop words, etc. for the models inference/generation configs so that those that do not have access to Mixtral elsewhere can get a even broader feel for it's capabilities that come from being able to make adjustments like this. This likely would help with accuracy of tasks overall in a chat stream.
I'm assuming that it wouldn't be a hard to make adjustments for. (UI input that passes that input to the inference backend.) This could open up a broader experience and effects for the community using it for research.
I've not been though the UI code, which I think is open source. I wouldn't have the time too. And likely not really motivated to do so at this time.

Thanks for taking time to read this.

Hugging Chat org

We recently released a feature that lets you tweak generation settings like temp, top k, top p, repetition penalties in assistants!
image.png

Feel free to create an assistant if you want to tweak those settings

This is excellent news! Thanks for letting me know.

dyoung changed discussion status to closed

Sign up or log in to comment