What was your thought process on this version? I thought it would be a good idea to open a discussion.

#3
by Joseph717171 - opened

So, I downloaded your newest version, and GGUF-Quantized using Bartowski's imatrix for it, what did you do different in this version from the previous versions. And, what was your thought process and goal that you hand in mind? πŸ˜‹

So, I downloaded your newest version, and GGUF-Quantized using Bartowski's imatrix for it, what did you do different in this version from the previous versions. And, what was your thought process and goal that you hand in mind? πŸ˜‹

This was my attempt at doing a partial train over the SPPo iter3 model with some of the datasets, and splitting the others over 0.2. (then merged them together with slerp) Motivation was to improve foundational logic and hopefully diversity of prose used by the end model. Honestly this version feels overbaked and repetitive compared to 0.5 in my own testing, albeit with a small improvement to logic. (Waiting on additional user feedback before i make any decisions with new variations.)

I noticed the model gets confused over whether it is me or it when roleplaying. Consistently the model will RP as me, when it should RP as itself - it's kind of funny in an annoying way. πŸ˜‚

I noticed the model gets confused over whether it is me or it when roleplaying. Consistently the model will RP as me, when it should RP as itself - it's kind of funny in an annoying way. πŸ˜‚

Interesting, i have several users testing and this is the first ive heard of that issue on this variation. (ive also been testing extensively to attempt and get the samplers dialed in and have yet to see that.)

I was curious, would you be able to make use of Self-Play Preference Optimization for your model. It would be interesting to see what it learns from the datasets. Keep in mind, I have no idea the undertaking this would be. But, it's fun to imagine and to speculate. πŸ€”

I was curious, would you be able to make use of Self-Play Preference Optimization for your model. It would be interesting to see what it learns from the datasets. Keep in mind, I have no idea the undertaking this would be. But, it's fun to imagine and to speculate. πŸ€”

Training over the base model they provided is super feasible, but setting up the pipeline and doing it myself with one of the models post training is a whole other magnitude of compute.

You are using the L3 Presets provided in the 0.72 repo correct, and not the chatml ones from 0.5?

That's what I was thinking: continuing Self-Play Preference Optimization off of their UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3. πŸ˜‹

Yes, I don't use ChatML for LLaMa-3-8B-Instruct Fine-Tunes. πŸ˜‚

Yes, I don't use ChatML for LLaMa-3-8B-Instruct Fine-Tunes. πŸ˜‚

Just making sure, 0.6 was trained on chatml (and i provided experimental presets for 0.5 in chatml at one point). So people were using those presets with the other versions... and it was leading to some problems.

Are you on Discord? I like HuggingFace's discussion forms. However, as a place for dynamic debate and discussion, they are a bit antiquated and limited in their utility. It be great to communicate to communicate there as well, if you're down. πŸ€”

We could use NeverSleep's discord. πŸ˜‹

https://discord.gg/AT5gpexk

I just saw you're already on there. My bad. πŸ˜…

Sign up or log in to comment