How HHH?

#4
by tensiondriven - opened

The Alpaca dataset contains instructions that result in models behaving "As an AI" and sometimes refusing certain instruction. I'm wondering if, or how much, this negatively affects Alpachino's generation capabilities. Have you noticed anything in this regard?

I've also heard that 30b models can sometimes produce lower quality work than smaller models like 13b when it comes to narrative over time, also just curious if you've experienced anything like that.

Thanks for doing all the hard work to make this possible!!

No problem, I enjoy the process! So -- I had my initial reservations of using Alpaca for this model, however, ChanSung's LoRA does not seem to exhibit this "as an AI model" behavior despite the very well done training on its dataset. I'm not completely sure why. Even raw Alpaca30b using ChanSung's LoRA seems to not care; so I decided to use that as the foundation of the instruct module of the model merge (I think about 50% a model with his LoRA applied and Story+COT). I am honestly surprised what it allows. That being said you raise another valid concern that larger param llama models can produce lower quality work; as counter intuitive as it sounds, I have hit roadblocks with some 30b models and them painting themselves in a narrative corner, over reliant on nearly repeating previous information instead of applying it and running with generating new information. For this I adjust the repetition penalty and raise the temperature of the model inference a bit to circumvent the eventual repetition. It usually works, not perfect, but overall the larger size does seem to add a better understanding of how things work and how to run a narrative.

I also find when using instructions, it can help to put in persistent memory or context "tell the story as if you are a/an X" as in, if it's a horror story "as if you are a masterful horror writer and a fanatic for horror fiction" or if it's something spicy "write it as if you are into X" etc. I find that being creative with instruction prompting and balancing keeping the instructions just loose enough not to corner the model into repetitive behavior is the best.

Brilliant, thank you for the response - I think there's not enough information about how lower parameter count models compare to higher. I'm content with your answer, closing!

tensiondriven changed discussion status to closed

Sign up or log in to comment