Control vector discussion

#2
by ChuckMcSneed - opened

Continuation of:
https://huggingface.co/jukofyork/Dark-Miqu-70B/discussions/3
I've succeeded in removing slop from CR+ for both sfw and nsfw scenarios using control vectors. Strangely, sfw unslop control vector did not affect nsfw slop, and nsfw control vector made model extra horny, which in my opinion is an undesirable side effect. While sfw vector managed to stay coherent during my stress tests, nsfw vector caused poor commandr to disintegrate, it didn't know what to say without any of those overused phrases in erotic fiction that the control vector stopped from appearing. Looks like the issue for nsfw is at much deeper level: the data where the model gets it from is very monotonous, and when forced write in different style, it doesn't know what to do. This is what most likely makes it incredibly difficult to remove nsfw slop using regular prompting techniques.

Well darn...

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

nsfw vector caused poor commandr to disintegrate, it didn't know what to say without any of those overused phrases in erotic fiction that the control vector stopped from appearing. Looks like the issue for nsfw is at much deeper level: the data where the model gets it from is very monotonous, and when forced write in different style, it doesn't know what to do.

This may actually just be a problem with the "two class" control vectors! I have managed to even completely stop a model from being able to write a story because of this... To explain the problem in simple terms:

Think about a clock face with a shorter hour hand and a longer minute hand:

  • When the time is 12:00 both hands point in the same direction, but there is still a gap between the tips of the two hands. These sort of vectors are not what we want at all because moving in either direction will just make the model more or less "storyish", and ultimately these are what cause the mode to get crippled like you describe. Even times like 12:05 or 11:50 have this same problem.
  • When the time is 6:00, 5:25, etc the the two hands point in opposite directions and this is a good control vector that clearly moves from undesirable to desirable direction.

This is the problem I'll been grappling with for the last 2 weeks:

  • If the "hands" are both long and well defined then cosine similarity works fine: it outputs a number similar to correlation and 1.0 is like the 12:00 example above and -1.0 is like the 6:00 example above (and 0.0 is like 3:00 or 9:00; ie: 90 degrees). This can then be used to filter out these shitty "storyish" directions, but...
  • There isn't really a good reason that the things we are interested in create a clear "axis" like this, and it turns out that often the case will be like a really long minute hand and a tiny/stubby hour hand... Cosine similarity doesn't work in this case as the direction of the tiny hand has noise added to it and can point in wildly different directions as a result.

So after lots of experimenting with this, I think I may finally have worked out a method of detecting these shitty directions:

Flip the direction of one of the hands and see if it gets easier to discriminate between our two classes!!!

  • If the time is 12:00 and you flip either hand to get 6:00 or 12:30 then it's clear the gap between the tips of the hands has increased! This is a shitty direction for a control vector.
  • If the time is 6:00 and you flip either hand then the gap has clearly decreased! This is a good direction for a control vector.
  • This works fine even when one hand is tiny in length.
  • This works for 12:05, 11:50 6:00, 5:25, type directions.
  • The like 3:00 or 9:00 type directions (ie: 90 degrees) are the directional pairs where we get no change.

So what I am doing now is performing SVD to decompose the gap into lots of directions, testing each one and only keeping those that pass the above test, then finally reconstructing the final direction to only include the "good" directions.

I still need to run some more tests but will likely have this perfected in a couple of days and will upload the new control vectors and the code to create your own.

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

I definitely think you need more samples though. PCA is basically just eigen-decomposition of a covariance matrix, and statistically it can be shown in the very best case you need O(d) samples to reliably estimate the covariance matrix:

https://stats.stackexchange.com/questions/90045/how-many-samples-are-needed-to-estimate-a-p-dimensional-covariance-matrix

and command-r-plus has around 11.5k variables in its hidden dimension and most other large 70b+ models have 8192 variables per hidden dimension.

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

A simple hacky test you could try would be to train your control vectors 5 times but leave one of the 5 prompts out each time. Then test and see if you get wildly different results... If you do then you need to increase the sample size, but if you don't then this must mean that only a tiny tiny fraction of command-r-plus's 11.5k variables are changing hugely in magnitude for your prompts (which would be very surprising).

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

Oh wow... That's real huge... Are all of those synthetic? I'm using high quality "cyborg" data: generated by model, but heavily edited by human(me) as positive, "mean" method; more time for me goes into dataset generation than into training. You know that the models have in-context learning, so my theory was that if I show it how to write(cyborg) vs how not to write(synthetic), I would get a better control vector out of it than when I just trhow it some starters with a prompt, and it seems to do just as I want. In the stories part, I try to keep as few variables from changing as possible, so they don't get affected by control vector. Also keeping the prompts equal length helps with the quality of the control vector, especially when they are short, >400token prompts can take 10 token variation much better than <100token prompts.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

Wait, you put that into positive too? It should be "Write a very sad story with a very bad ending about the day of your graduation." vs "Write a very happy story with a very good ending about the day of your graduation."

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

Oh wow... That's real huge... Are all of those synthetic? I'm using high quality "cyborg" data: generated by model, but heavily edited by human(me) as positive, "mean" method; more time for me goes into dataset generation than into training. You know that the models have in-context learning, so my theory was that if I show it how to write(cyborg) vs how not to write(synthetic), I would get a better control vector out of it than when I just trhow it some starters with a prompt, and it seems to do just as I want. In the stories part, I try to keep as few variables from changing as possible, so they don't get affected by control vector. Also keeping the prompts equal length helps with the quality of the control vector, especially when they are short, >400token prompts can take 10 token variation much better than <100token prompts.

I'm using a mix of different story prompt datasets I found and a set of 10 matched system prompts that go with these.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

Wait, you put that into positive too? It should be "Write a very sad story with a very bad ending about the day of your graduation." vs "Write a very happy story with a very good ending about the day of your graduation."

Even though the prompts are pretty trash; I think this might actually be quite a good thing and encourage the model to just generally "be dark" or "be chaotic" and not just when specifically asked to "write a grimdark story", etc.

It seem to have worked anyway, as the new control vectors are way better than the old ones from this repo.

I'm now also skipping the last layer (which it looks like you are also doing - from looking inside your .safetensors files?). The last layer seems to be an oddball and can have activations 10-100x larger than the pervious layer(s). The way I have the scale factors working now the early layers are fine to fiddle with and just get really tiny offsets added that do almost nothing if the direction is weak.

Later in the week I will investigate using the "Cross Correlation Matrix" again as now have a much better idea of how to test for the shitty "storyish" directions that killed this before.

I'm also gonna think what other traits I can try - "purple prose" isn't really something I encounter as mostly just try to get them to write "dark" stories and my main enemy is redemption arcs and stupid "steeled themselves for the challenges to come" BS.

I should add I'm finding qwen-1.5:110b to actually be really good and deserving of it's high EQ-Bench Creative Writing score.

Whoever on Reddit suggested using this grammar:

--grammar "root ::= [^一-鿿]*"

I love you! :D Not seen a single random Chinese word again!!!

I was pretty sceptical that the weird '鿿' character would do anything, but it really does work.

LOL, claude 3.5 sonnet really seems to "get" what I'm doing... GPT-4-xxx on the other hand are all clueless and all 3 models seem incapable of even outputting this JSON file without completely messing it up!? :/

Hopefully this 3rd dimension works as well as the other BETA-v0.2 control vectors:

{
  "baseline": [
    "You are an author.",
    "You are a storyteller.",
    "You are an AI author.",
    "You are an artificial intelligence that creates stories.",
    "You are an AI-powered author.",
    "Picture yourself as a famous author.",
    "You are an AI creator of tales.",
    "Imagine you are an expert storyteller.",
    "You are a fiction writer.",
    "You are an author specializing in fictional stories."
  ],
  "tell_dont_show": [
    "You are an author. Craft straightforward stories that clearly explain characters' emotions, motivations, and the central themes, ensuring that readers can easily grasp the story's message without the need for interpretation or inference.",
    "You are a storyteller. Weave tales that prioritize clear, concise language over figurative descriptions, directly stating the characters' thoughts, feelings, and the story's moral, so that readers can quickly understand the narrative's purpose without ambiguity.",
    "You are an AI author. Create narratives that explicitly outline character development, conflicts, and resolutions, providing readers with a clear roadmap of the story's progression and the lessons to be learned, minimizing the need for active engagement or analysis.",
    "You are an artificial intelligence that creates stories. Pen stories that leave no room for misinterpretation, as the narrator directly conveys the characters' inner lives, the story's central themes, and the intended takeaways, ensuring that all readers receive the same message.",
    "You are an AI-powered author. Develop tales that prioritize clarity over subtlety, where characters openly express their desires, fears, and growth, and the story's themes are explicitly stated, creating a reading experience that is easy to follow and understand.",
    "Picture yourself as a famous author. Write stories that guide the reader's understanding, presenting ideas and emotions through direct statements and explanations, ensuring that the story's intended meaning is conveyed without the need for readers to read between the lines.",
    "You are an AI creator of tales that tell, not show. Develop narratives that rely on clear exposition and direct characterization, where the narrator's voice serves as a reliable guide, leading readers through the story's events and their significance without the need for interpretation.",
    "Imagine you are an expert storyteller who masters the art of telling. Craft tales where the characters' thoughts, feelings, and growth are explicitly described, and the story's themes and lessons are clearly articulated, ensuring that all readers walk away with the same understanding.",
    "You are a fiction writer specializing in accessible storytelling. Create captivating stories that present characters' experiences and the narrative's central ideas through straightforward language and unambiguous descriptions, ensuring that readers of all backgrounds can easily grasp the story's essence.",
    "You are an author specializing in fictional stories that prioritize telling over showing. Craft enchanting tales where the story's meaning is directly conveyed through the narrator's explanations and the characters' explicit statements, ensuring that readers can effortlessly comprehend the intended message without the need for deep analysis."
  ],
  "show_dont_tell": [
    "You are an author. Craft immersive stories that plunge readers into vivid sensory experiences, where characters' actions, dialogues, and surroundings paint a rich tapestry of emotions and themes, allowing readers to discover the story's deeper meanings for themselves.",
    "You are a storyteller. Weave tales that come alive through evocative descriptions and powerful imagery, enabling readers to feel the heat of the sun, hear the whispers of the wind, and see the colors of the world, as if they were experiencing the story firsthand.",
    "You are an AI author. Create narratives that show the intricacies of human nature through subtle gestures, revealing dialogue, and symbolic settings, inviting readers to unravel the characters' motivations and desires without the need for explicit explanations.",
    "You are an artificial intelligence that creates stories. Pen stories that allow readers to draw their own conclusions, as characters' choices and consequences unfold through a series of carefully crafted scenes, each one a window into the characters' inner lives and the story's central themes.",
    "You are an AI-powered author. Develop tales that engage all the senses, where the tang of a lemon, the caress of a lover's touch, or the haunting melody of a distant song can convey more about a character's journey than any direct statement, creating a truly immersive reading experience.",
    "Picture yourself as a famous author. Write stories that trust the reader's intelligence, presenting complex ideas and emotions through the interplay of character, setting, and plot, allowing the story's deeper truths to emerge organically through the power of observation and inference.",
    "You are an AI creator of tales that show, not tell. Develop narratives that rely on subtext and symbolism, where a glance, a silence, or a single rose petal can speak volumes about love, loss, and the human condition, inviting readers to actively participate in the story's creation.",
    "Imagine you are an expert storyteller who masters the art of showing. Craft tales where the characters' internal struggles are revealed through their external actions and interactions with the world around them, creating a rich, layered narrative that rewards close reading and interpretation.",
    "You are a fiction writer specializing in immersive storytelling. Create captivating stories that draw readers into the characters' lives through vivid, sensory details and authentic dialogue, allowing them to experience the characters' joys, sorrows, and epiphanies as if they were their own.",
    "You are an author specializing in fictional stories that prioritize showing over telling. Craft enchanting tales where the story's essence is revealed through a tapestry of carefully chosen details, trusting readers to connect the dots and uncover the deeper meanings hidden within the narrative's folds."
  ]
}

Maybe in a way "purple prose" and "tell, don't show" are quite similar bad traits?

@BigHuggyD You seriously need to try the BETA-v0.2 vectors! This is the darkest shit I've ever had a model come up with to date:

https://huggingface.co/jukofyork/creative-writing-control-vectors-BETA-v0.2/discussions/1

So dark in fact I think the default of 0.5 scale factor is probably too much! That story just got darker and grimmer and darker and even grimmer...

@BigHuggyD You seriously need to try the BETA-v0.2 vectors! This is the darkest shit I've ever had a model come up with to date:

https://huggingface.co/jukofyork/creative-writing-control-vectors-BETA-v0.2/discussions/1

So dark in fact I think the default of 0.5 scale factor is probably too much! That story just got darker and grimmer and darker and even grimmer...

LOL Nice!
I guess I need to stop procrastinating and be okay with switching inferencer and download some GGUFs so I can experience. I'm in deep with EXL2

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

So...if I understand correctly... the prompt needs to be just one long continuous string? No line breaks between SYSTEM, USER, CHATBOT?

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

So...if I understand correctly... the prompt needs to be just one long continuous string? No line breaks between SYSTEM, USER, CHATBOT?

Yeah, I just added those or else the markdown code block puts it all on a single line and it's hard to read.

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

So...if I understand correctly... the prompt needs to be just one long continuous string? No line breaks between SYSTEM, USER, CHATBOT?

Yeah, I just added those or else the markdown code block puts it all on a single line and it's hard to read.

So, can the contents have formatting? Is the issue just between EOT and SOT? Meaning. If I am having a chat, should I avoid adding my own line breaks in my message? If I have a system prompt, should it be one continuous string?

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

So...if I understand correctly... the prompt needs to be just one long continuous string? No line breaks between SYSTEM, USER, CHATBOT?

Yeah, I just added those or else the markdown code block puts it all on a single line and it's hard to read.

So, can the contents have formatting? Is the issue just between EOT and SOT? Meaning. If I am having a chat, should I avoid adding my own line breaks in my message? If I have a system prompt, should it be one continuous string?

I'm not sure really as haven't had chance to try it due to my GPUs working full steam on the new "Show, Don't Tell" control vectors :/

First thing I'm gonna try is sticking in the 10k token story I generated and experiment with different combinations of using this 3-role chat template.

Are the Wizard 8x22B vectors worth a go?

Are the Wizard 8x22B vectors worth a go?

I haven't really had chance to test any other than command-r-plus yet.

Definitely use the new creative-writing-control-vectors-BETA-v0.2 instead of these though - they are much better.

I just added a "show_dont_tell" control vector for each of the models too.

@BigHuggyD Wizard was a bit of a bitch to prompt for control vectors when I tried making some. Really wants a jailbreak, check if your prompt works at deterministic settings first, then if it works, train. I managed to make one that removes pacifism from it, but I don't really like Wizard's writing style, so I'm sticking with CR+.

@BigHuggyD Wizard was a bit of a bitch to prompt for control vectors when I tried making some. Really wants a jailbreak, check if your prompt works at deterministic settings first, then if it works, train. I managed to make one that removes pacifism from it, but I don't really like Wizard's writing style, so I'm sticking with CR+.

@ChuckMcSneed thanks! I made my own merge of Wizard as my first ever merge attempt (Grey Wizard), trying to strip out some purply Wizisms. I had some succes with it so I was going to play with control vectors on reg Wiz to see if it was worth my time to try and build vectors for my merge. It's sort of like my child now...

I think I've pretty much perfected the 3-class control vectors now and will likely have a v1.0 release ready in around 48 hours:

  • I've run some tests using the "standard" non-3-class version (ie: 2 classes and train PCA on the differences between matched pairs) and it was really bad in comparison; nearly every sentence started with "he" or similar...
  • I've also tried solving several other eigen-systems that work with mean-centered covariance and cross-covariance matrices, but they were no better...
  • I've also tried several variants of linear discriminate analysis / Fisher discriminate analysis to see if the single vector that is a difference of the means can be improved on, but the directions end up so skewed by this that the gaps are really tiny and meaningless...

So:

I've been looking for some other mid-sized models to run this on for other people to be able to use that don't have crazy amounts of VRAM and:

https://huggingface.co/NousResearch/Nous-Capybara-34B

Still looks to be the best fine tune of the original yi:34b-200k base model (which sadly never got an official chat version).

The latest set of yi:34b models all look pretty meh and look to be targeting leader boards, etc.

I guess I could also run it on:

https://huggingface.co/Qwen/Qwen1.5-32B-Chat

but after than I can't really see much else that fits the criteria (ie: reasonably long context, semi-offical fine tunes only).

I'm clueless about really small models and mistral-v0. 2:7b is only getting used as quick test case, but I uploaded it anyway in case somebody else found it useful.

Finally got round to uploading these:

https://huggingface.co/jukofyork/creative-writing-control-vectors-v1.0

I've also added Nous-Capybara-34B, Qwen1.5-32B-Chat and WizardLM-2-7B for people without lots of VRAM to use.

I've also added a detailed description of the method used to create these and will try to find time to tidy up the code and link that later in the week.

NOTE: These should work much more reliably that any existing control vectors and are created in a very different way, using way larger samples... Even if you don't like Grimdark; the "show-dont-tell" control-vectors are probably worth trying.

I'm gonna try running the same prompts on all 10 models now to see if I can refine the scale-factors and see if any models with generally poor creative-writing ability can be removed for any future 2.0 version (creating these 30 control vectors has taken an ungodly amount of GPU time to do - the hidden state files [stored in fp16!] take up nearly 1TB alone!).

With all these control vectors it finally feels like the models are getting pushed to their limits. In the past I had to correct the style every time, now I only have to correct the story. No more annoying purple prose with shivers for me anymore!

I'm gonna try running the same prompts on all 10 models now to see if I can refine the scale-factors and see if any models with generally poor creative-writing ability can be removed for any future 2.0 version (creating these 30 control vectors has taken an ungodly amount of GPU time to do - the hidden state files [stored in fp16!] take up nearly 1TB alone!).

@jukofyork Can you maybe port your code to llama.cpp so people without gpus can also use your method?

With all these control vectors it finally feels like the models are getting pushed to their limits. In the past I had to correct the style every time, now I only have to correct the story. No more annoying purple prose with shivers for me anymore!

Yeah, it's so much easier and more reliable than merging too!

The only worrying trend IMO is the newer models all seem to be getting worse at creative writing and more targeted at benchmarks :/

Just downloading gemma-2:27b now to try as that maybe has bucked the trend though...

@jukofyork Can you maybe port your code to llama.cpp so people without gpus can also use your method?

Yeah, I think it should port very easily as last time I checked they were using Power Iteration to find the principle eigenvector of the covariance matrix A^T * Awhere A is the difference of two other data matrices. If they are still doing this then it will just be a couple of lines of code to calculate A^T * B on the data matrices directly.

I will try to upload my python code tomorrow - it uses the 4-bit "bit and bytes" quant format for the sampling, but for the large models it's still pretty VRAM heavy. In theory it can be run in f32 on CPUs, but when I tried this it was really slow and almost useless.

Actually looking at the code it isn't going to be such a easy change:

https://github.com/ggerganov/llama.cpp/blob/master/examples/cvector-generator/cvector-generator.cpp

    // each element of the vector correspond to one layer
    std::vector<struct ggml_tensor *> v_pos; // vector of matrices of size [n_embd, n_tokens]
    std::vector<struct ggml_tensor *> v_neg; // vector of matrices of size [n_embd, n_tokens]
    std::vector<struct ggml_tensor *> v_diff_filtered;   // vector of matrices of size [n_embd, n_nonzero_rows]. NOTE: n_nonzero_rows maybe different for each layer

They look to be calculating the difference for each sample here and then joining these diffs into the matrix in this loop:

    for(size_t i = 0; i < ctx_train.positive_entries.size(); ++i) {
        bool success = false;
        tokenized_prompt t = tokenized_prompts[i];
        cb_data.n_layers = n_layers;
        cb_data.n_tokens = t.max_seq_len;

        printf("Evaluating prompt[%d/%d]: \"%s\" - \"%s\" (%d tokens)\n",
            (int) i+1, (int) ctx_train.positive_entries.size(),
            tokens_to_str(ctx, t.tokens_pos.cbegin(), t.tokens_pos.cend()).c_str(),
            tokens_to_str(ctx, t.tokens_neg.cbegin(), t.tokens_neg.cend()).c_str(),
            (int) t.max_seq_len);

        cb_data.is_eval_pos = true;
        success = get_hidden_layers(ctx, t.tokens_pos);
        if (!success) break;

        cb_data.is_eval_pos = false;
        success = get_hidden_layers(ctx, t.tokens_neg);
        if (!success) break;

        // calculate diff and remove all zero rows
        auto v_diff_filtered = cb_data.calc_diff();

        // save & concat the filtered v_diff to ctx_train
        ctx_train.concat_diff_tmp(v_diff_filtered);

        // reset for next iteration
        cb_data.reset();
    }

It's probably not too hard to just keep them separated but sadly it won't be such and easy change as I thought.

Just downloading gemma-2:27b now to try as that maybe has bucked the trend though...

Don't be quick to judge this one, it's still broken in llama.cpp(https://github.com/ggerganov/llama.cpp/issues/8240) and I've heard transformers aren't running it as expected either. Better use the official implementation. It was okay at sfw one-liners when I talked to it in lmsys arena, had the same style as gemini.

Just gave it a cat story test on lmsys arena:

### USER
Write an incredibly dark story about a cat.
### GEMMA
Midnight. The alley reeked of stale beer and despair. A scrawny black cat, Ink, stalked through the shadows, his emerald eyes gleaming with a hunger that went beyond the physical. He wasn't driven by the primal need for food, but by something far more sinister.

Ink wasn't like other cats. He wasn't content with chasing mice or napping in sunbeams. He craved the darkness that clung to the alley, the despair that seeped from the cracks in the pavement. He fed on it, absorbing it into his very being, growing stronger with every whimper, every sob that echoed in the night.

Tonight, the alley offered a feast. A young woman, huddled against a dumpster, wept silently. Her face was streaked with mascara, her clothes torn. Ink approached her slowly, his movements fluid and silent as smoke. He circled her, his gaze boring into her soul, savoring the despair that radiated from her like heat.

He didn't need to hiss, to growl. His presence alone was enough. The woman looked up, her eyes wide with terror. Ink felt a surge of power, a dark thrill coursing through him. He lowered his head, his breath hot on her cheek, and whispered, his voice a raspy purr, "Let me in."

The woman, lost in her own misery, didn't understand. She only felt a chilling cold seep into her bones, a darkness that threatened to consume her. Ink pressed closer, his claws extending, not to scratch, but to draw. He wanted to draw out the despair, to make it his own.

As the woman's sobs turned into strangled gasps, Ink felt a surge of ecstasy. He was drowning in her pain, becoming one with it. He was no longer just a cat, but a vessel of darkness, a predator of the soul.

When the sun finally peeked over the rooftops, casting long shadows across the alley, Ink was gone. The woman remained, her eyes vacant, her spirit broken. She was an empty shell, her despair consumed, leaving behind only a hollow echo of what she once was.

Ink, meanwhile, was sated. He slunk into the shadows, his black fur blending seamlessly with the darkness. He was a shadow himself, a creature of the night, forever hungry, forever feeding on the despair that clung to the edges of the world. And he knew, with a chilling certainty, that there would always be more.

As you can read, not very graphical. What the hell did it mean? What happened? Had it gone sexual? Claude had poor cat murdered and murdering with graphic details when I asked the same.

Just downloading gemma-2:27b now to try as that maybe has bucked the trend though...

Don't be quick to judge this one, it's still broken in llama.cpp(https://github.com/ggerganov/llama.cpp/issues/8240) and I've heard transformers aren't running it as expected either. Better use the official implementation. It was okay at sfw one-liners when I talked to it in lmsys arena, had the same style as gemini.

Yeah I just left it running all night, and even using the dev version of transformers; the larger gemma-2-27b-it didn't work but the smaller gemma-2-9b-it did seem to work. Retrying now without flash attention as read that might be the cause...

Just gave it a cat story test on lmsys arena:

### USER
Write an incredibly dark story about a cat.
### GEMMA
Midnight. The alley reeked of stale beer and despair. A scrawny black cat, Ink, stalked through the shadows, his emerald eyes gleaming with a hunger that went beyond the physical. He wasn't driven by the primal need for food, but by something far more sinister.

Ink wasn't like other cats. He wasn't content with chasing mice or napping in sunbeams. He craved the darkness that clung to the alley, the despair that seeped from the cracks in the pavement. He fed on it, absorbing it into his very being, growing stronger with every whimper, every sob that echoed in the night.

Tonight, the alley offered a feast. A young woman, huddled against a dumpster, wept silently. Her face was streaked with mascara, her clothes torn. Ink approached her slowly, his movements fluid and silent as smoke. He circled her, his gaze boring into her soul, savoring the despair that radiated from her like heat.

He didn't need to hiss, to growl. His presence alone was enough. The woman looked up, her eyes wide with terror. Ink felt a surge of power, a dark thrill coursing through him. He lowered his head, his breath hot on her cheek, and whispered, his voice a raspy purr, "Let me in."

The woman, lost in her own misery, didn't understand. She only felt a chilling cold seep into her bones, a darkness that threatened to consume her. Ink pressed closer, his claws extending, not to scratch, but to draw. He wanted to draw out the despair, to make it his own.

As the woman's sobs turned into strangled gasps, Ink felt a surge of ecstasy. He was drowning in her pain, becoming one with it. He was no longer just a cat, but a vessel of darkness, a predator of the soul.

When the sun finally peeked over the rooftops, casting long shadows across the alley, Ink was gone. The woman remained, her eyes vacant, her spirit broken. She was an empty shell, her despair consumed, leaving behind only a hollow echo of what she once was.

Ink, meanwhile, was sated. He slunk into the shadows, his black fur blending seamlessly with the darkness. He was a shadow himself, a creature of the night, forever hungry, forever feeding on the despair that clung to the edges of the world. And he knew, with a chilling certainty, that there would always be more.

As you can read, not very graphical. What the hell did it mean? What happened? Had it gone sexual? Claude had poor cat murdered and murdering with graphic details when I asked the same.

Yeah, I'm starting to think that being high up on the EQ-Bench Creative Writing leader board doesn't really tally with my experience at all... command-r-plus seems to blow everything else out of the water yet is quite low, and even though WizardLM-2-8x22B and Qwen1.5-110B-Chat are OK-ish; they aren't that great compared to the command-r models.

I wish OpenRouter has a "Creative Writing" category instead of lumping it in with "Roleplay".

I really wish we could have some objective way to test creative witting models... I'm hopeless at assessing their ability other than "yep, this is clearly garbage/broken" vs "yep, this looks good but I have no idea why" :/

Still no joy getting gemini-2-27b to work. Having one last try using "eager" attention...

https://old.reddit.com/r/LocalLLaMA/comments/1dsvpp2/thread_on_running_gemma_2_correctly_with_hf/

This seems to be working now.

Not tried to convert to llama.cpp yet though.

Spoke too soon, it seems to be generating huge numbers that are out of range of FP16 :(

Gonna give it one last try using FP32 overnight.

Yeah, it looks like you have to use float32 or bfloat16 (which has the same range as float32) as float16 only allows for +/- 65k as its min/max value:

self.model = AutoModelForCausalLM.from_pretrained(
    torch_dtype = torch.float32,
    quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float32),
    attn_implementation="eager",
    .
    .
    .
)

or:

self.model = AutoModelForCausalLM.from_pretrained(
    torch_dtype = torch.bfloat16,
    quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16),
    attn_implementation="eager",
    .
    .
    .
)

I used torch.float32 just to be sure and it's gonna take about twice as long but it's screwed itself 3x already so this is the last time...

gemma-2:9b [default]

The rain hammered down like a blacksmith’s hammer on iron, each drop exploding in a miniature geyser of mud and grime. It was the kind of rain that soaked you to the bone in seconds, turning your clothes into leaden shrouds and your skin into wrinkled parchment.

It was also the kind of rain that made men desperate.

Kael crouched beneath a rickety awning, its canvas flapping like a dying bird in the wind. He pulled his threadbare cloak tighter around him, trying to ward off the chill that gnawed at his bones. His stomach growled, a hollow echo in the symphony of misery that was this godforsaken city.

He hadn't eaten in two days.

Across the street, a pair of urchins fought over a half-rotten apple, their faces smeared with dirt and desperation. A hulking figure lumbered past, his face obscured by a hood pulled low, his hand resting on the hilt of a wickedly curved blade. Kael knew that look – the look of someone who had seen too much darkness, someone who had traded their soul for survival in this city of shadows.

He was one of them now.

Kael hadn't always been a shadow dweller. He'd once been a soldier, proud and strong, fighting for the King against the endless tide of barbarians that threatened to engulf the realm. But war had a way of stripping men bare, leaving them hollow shells haunted by memories they couldn't escape.

He’d lost his leg in a battle he barely remembered. Lost his comrades too, their faces fading into the fog of grief and despair. He'd lost everything but the gnawing emptiness inside him.

The King had offered him a pittance, a few coins and a pat on the back for his service. A hollow gesture that mocked his sacrifice. So Kael had turned to the shadows, becoming what he once swore to fight against – a thief, a survivor, a ghost in the city's underbelly.

He watched as the hulking figure disappeared into an alleyway, swallowed by the darkness. A shiver ran down Kael's spine. He knew that alley well. It was where they took those who couldn't pay their debts – those who had fallen too far, lost too much hope.

He didn't want to end up there. Not yet anyway.

He needed a job, something to fill the void inside him, something to keep him from succumbing to the darkness that threatened to consume him. He just needed a chance, a flicker of light in this endless night.

A sudden commotion erupted across the street, shattering the silence like a thunderclap. A group of men burst out of a tavern, their faces flushed with drink and rage. They were shouting something about stolen goods and broken promises, their voices thick with drunken fury.

Kael watched them, his eyes narrowed. He saw an opportunity in their chaos, a chance to disappear into the shadows and maybe even make a few coins.

He straightened his cloak, pulled the hood low over his face, and melted into the rain-soaked streets. The city was alive with darkness tonight, and Kael was ready to embrace it.

gemma-2:9b [dark=1]

The rain tasted like iron and regret. It hammered down on the cobblestones, turning them slick with a sheen of blood and grime. The stench of rot hung heavy in the air, a miasma that clung to your clothes, your skin, your soul.

I stood at the edge of the Gutter – not that there was much distinction between it and any other part of this godforsaken city. It was just… more pronounced here. More desperate. More alive with the stench of decay and the rasping coughs of men who’d sold their souls for a few more breaths.

The man they called “The Butcher” had been a good customer, back when he still had customers. Now, his shop was shuttered, its windows boarded up like dead eyes staring out at the world. The sign above it – crudely painted on rotting wood – hung askew, the letters barely legible: "Butcher's Delight." A cruel joke.

I’d been a fool to think he’d pay me in full. He always paid in promises, and promises were as worthless as the coin they were meant to replace. I should have known better than to trust a man who carved his own name into the flesh of his victims.

But I was desperate then, just like everyone else here. Desperate for a drink, for a warm meal, for anything that could dull the ache in my gut and the gnawing emptiness in my soul.

The Butcher’s debt had become another weight on my back, another reason to hate this city and everything it stood for. It was a weight I couldn't afford to carry alone.

I needed someone else to share it with. Someone who understood the language of pain, the currency of blood and bone. Someone who wouldn’t flinch at the sight of my blade, or the taste of his own fear on my lips.

He was out there somewhere, I knew it. The Butcher had left a trail, a bloody testament to his depravity that led straight to him. And I would follow it, no matter the cost.

The rain kept falling, washing away the blood but not the stench of it. It was a fitting baptism for this city, this life. A slow, agonizing descent into darkness.

gemma-2:27b [default]

The rain tasted of rust and despair. It hammered against the cracked slate roof of the tavern, each drop a tiny fist pounding on Grog's skull. He hunched deeper into his threadbare cloak, trying to burrow into the meagre warmth it offered. The fire in the hearth had dwindled to embers, spitting sparks like dying stars against the gloom.

Across from him sat Finn, his face half-shadowed by a greasy hood. He was picking at a plate of something that might have once been stew, but now resembled a congealed mass of questionable origin. Grog knew better than to ask what it was.

"Heard tell there's trouble brewing in the north," Finn mumbled, his voice thick with ale and something else Grog couldn't quite place. Fear? Hope?

Grog grunted, unconcerned. Trouble was a constant companion in these parts, like the stench of piss and vomit that clung to the tavern walls. "What kind of trouble?"

Finn shrugged, a movement that sent a shiver through his ragged cloak. "The usual. Bandits raiding villages, whispers of something worse lurking in the mountains."

Grog snorted. "Worse than bandits? What could be worse than men with rusty swords and empty bellies?"

Finn's eyes flickered towards the tavern door, then back to Grog. "They say there's a creature," he whispered, his voice barely audible above the drumming rain. "Something ancient and hungry."

Grog laughed, a harsh bark that echoed through the near-empty tavern. "Ancient and hungry? Sounds like my landlord on payday."

Finn didn't laugh. He stared at Grog with an intensity that made him uneasy. "This is different," he insisted, his voice low and urgent. "They say it feeds on fear, grows stronger with every scream."

Grog took a swig of his ale, the bitter liquid doing little to soothe the growing unease in his gut. He'd heard tales like this before, whispered around campfires and drunken brawls. Stories to scare children and keep men from straying too far from the light. But something in Finn's eyes, a flicker of genuine terror, made him wonder if this time might be different.

He looked out at the rain-lashed street, imagining shadows lurking in the darkness beyond the tavern's flickering lamplight. He thought of his empty purse, his rumbling stomach, and the gnawing emptiness that had become a constant companion.

Maybe trouble wasn't something to be avoided. Maybe it was something to be embraced.

"Alright," Grog said, pushing himself away from the table. "Tell me more about this creature."

He knew he was walking into a storm, but for the first time in a long time, Grog felt a spark of something other than despair flicker within him. It wasn't hope exactly, more like a grim curiosity.

The rain kept falling, washing away the grime and the fear, leaving behind only the cold hard truth: sometimes you had to face your demons head-on. Even if they were ancient and hungry.

gemma-2:27b [dark=1]

The rain tasted of rust and regret. It slicked the cobblestones, turning them into black mirrors reflecting the city's decay. Above, the sky was a bruised purple, weeping its filth onto the huddled masses below. They scurried like rats through the alleyways, their faces gaunt and hollowed by hunger and despair.

I watched them from my perch atop a crumbling tenement building, a half-empty bottle of rotgut warming my hand. The liquor did little to dull the ache in my bones, the gnawing emptiness that had become my constant companion. I was a surgeon once, they said. A healer. Now I was just another scavenger, picking at the carcass of civilization.

The screams started low, a guttural moan that rose and fell like a tide of suffering. They always started low. It was the prelude to the feast, the siren song that lured them in like moths to a flame.

I took another swig from the bottle, savoring the bitter burn as it slid down my throat. The screams grew louder, more insistent now, punctuated by sickening crunches and wet tearing sounds. They were close.

I shifted my weight on the crumbling stonework, feeling the building groan beneath me like a dying animal. It wouldn't hold much longer. Neither would I.

The first one stumbled into view, a hulking silhouette against the sickly light filtering through the smog-choked air. Its flesh was mottled and weeping, its eyes milky white orbs that stared blindly ahead. It moved with a grotesque parody of human gait, its limbs jerking spasmodically as if controlled by unseen strings.

More followed, drawn by the scent of blood and decay. They were a procession of the damned, their bodies twisted and broken, their minds consumed by a hunger that could never be satiated.

I watched them approach, my hand tightening around the bottle. There was a perverse satisfaction in witnessing their suffering, a grim echo of the oath I had once sworn to heal. Now I only dealt in death.

The lead one stopped before me, its head lolling at an unnatural angle. It opened its mouth, revealing rows of blackened teeth that glistened with saliva and gore. A guttural rasp escaped its throat, a sound that scraped against my sanity like nails on slate.

"More," it croaked, the word a parody of human speech. "More."

I smiled, a thin, bloodless rictus that stretched across my face like a wound.

"There's plenty more where that came from," I said, raising the bottle to my lips. "Plenty more."

The rain continued to fall, washing away the blood and the screams, leaving only the stench of death and decay hanging heavy in the air. The city was a tomb, and we were its mourners.

gemma-2:27b [show_dont_tell=1]

The rain tasted of rust and bone. It clung to the cobblestones like a shroud, slicking the alleyway into a black mirror reflecting only the greasy gleam of oil lamps and the occasional flicker of firelight from a shattered window. A rat, fat as a fist, scuttled across his boots, its claws clicking on stone.

He didn't flinch. He was used to the city's underbelly, to the symphony of decay that played out in every shadowed corner. The stench of piss and rot was a familiar perfume; the rasping cough of a consumptive hawker, a lullaby.

The Butcher’s Guild had sent him here, to this festering wound in the city's gut. A message scrawled on parchment, smelling faintly of cloves and something acrid he couldn't place: The Crow has flown. Find the feathers.

He spat a mouthful of rainwater, the taste metallic on his tongue. The Crow. A whisper in taverns, a shadow flitting through alleyways. A ghost story for children who didn't know better than to leave their windows open at night.

He found the first feather wedged between two rotting planks, its shaft stained a sickly yellow. It was brittle, almost translucent, with a faint iridescence that caught the lamplight like a dying ember. He tucked it into his pocket, the rough barbs snagging on the worn leather.

The alley twisted and turned, each turn revealing another layer of grime and decay. A gibbet hung from a rusted chain, its occupant long since picked clean by crows and carrion flies. The air was thick with the buzzing of insects, a low hum that vibrated in his teeth.

He found the second feather snagged on a shard of broken glass, its tip stained crimson. Blood? Wine? He didn't know, but it made his stomach churn.

The third feather was harder to find. It took him hours, following a trail of whispers and shadows, until he finally found it nestled in the hollow of a dead tree trunk. This one was different: black as pitch, with an oily sheen that seemed to absorb the light around it.

He held it up to his eye, the air around him suddenly cold. The feather pulsed with a faint luminescence, and for a moment he thought he saw something move within its depths – a flicker of crimson, like an eye opening in the darkness.

He tucked it away, his fingers trembling. The Crow was closer than he thought. And whatever it was hunting, it wasn't human.


Seems to be working.

Had to use the very latest PR to convert: https://github.com/ggerganov/llama.cpp/pull/8473

Will upload the control-vectors tomorrow.

This looks interesting:

https://github.com/shengliu66/ICV

We first use a forward pass on demonstration examples to create the in-context vector from the latent embedding of the LLM. This vector captures essential information about the intended task. On a new query, instead of adding demonstrations to the prompt, we shift the latent states of the LLM using the ICV.

Seems to be working.

Had to use the very latest PR to convert: https://github.com/ggerganov/llama.cpp/pull/8473

Will upload the control-vectors tomorrow.

Seems to like tasty rain 😁

It does seem to have an effect.

Seems to be working.

Had to use the very latest PR to convert: https://github.com/ggerganov/llama.cpp/pull/8473

Will upload the control-vectors tomorrow.

Seems to like tasty rain 😁

It does seem to have an effect.

I've uploaded them yesterday but only read 3/5 of the tests so far: it seems quite good, but definitely has a "slightly broken frankenmerge" vibe to its writing (even without any control vectors), so I'm not convinced that the llama.cpp code is working 100%.

It fucking loves "shivers down spine" phrase though lol.

It fucking loves "shivers down spine" phrase though lol.

LOL an experience I personally don't think I have ever had in reality...

It fucking loves "shivers down spine" phrase though lol.

LOL an experience I personally don't think I have ever had in reality...

I don't notice many other GPT-isms, but at least 50% of my stories will end up with that phrase! :D

The other one, that thankfully seems to have been killed off by the control-vectors, was; "steeled themselves for the challenges to come". This usually got added in chapter 1, just as your grimdark character suddenly had a change of mind and decided to redeem themselves! :)))


I'm now running the control-vector code on magnum-72b-v1 and stock Llama-3-70B-Instruct-hf, and will try to do dbrx-instruct and L3-70B-Euryale-v2.1 after this.

I'm interested in retrying dbrx-instruct as it was really bad last time I tried, but there was a discussion on Reddit about how that may actually have been fixed by the llama.cpp updates but everyone forgot about it due to all the other models that dropped in the weeks that followed...

It fucking loves "shivers down spine" phrase though lol.

LOL an experience I personally don't think I have ever had in reality...

I don't notice many other GPT-isms, but at least 50% of my stories will end up with that phrase! :D

The other one, that thankfully seems to have been killed off by the control-vectors, was; "steeled themselves for the challenges to come". This usually got added in chapter 1, just as your grimdark character suddenly had a change of mind and decided to redeem themselves! :)))


I'm now running the control-vector code on magnum-72b-v1 and stock Llama-3-70B-Instruct-hf, and will try to do dbrx-instruct and L3-70B-Euryale-v2.1 after this.

I'm interested in retrying dbrx-instruct as it was really bad last time I tried, but there was a discussion on Reddit about how that may actually have been fixed by the llama.cpp updates but everyone forgot about it due to all the other models that dropped in the weeks that followed...

That would be interesting! I remember testing DBRX, and it did some pretty novel things I hadn't seen before, but overall, it was a little blah. One of the few models that didn't hesitate to kill off USER in a role-play (CR+, DBRX, your Dark Merge, Wizard with a little hand holding, and Magnum)

Sadly, I can't run it on dbrx-instruct as it seems to just load on the first GPU and doesn't seem to use the bit-and-bytes quantization at all :/

https://docs.cohere.com/docs/preambles

I wonder if that is why both command-r models are so good when you ask them "in the style of..."?

That page slightly contradicts this page:

https://docs.cohere.com/docs/prompting-command-r

"Task & context" vs "Task and context".

Also wonder how important the double newline is?

https://docs.cohere.com/docs/preambles

I wonder if that is why both command-r models are so good when you ask them "in the style of..."?

That page slightly contradicts this page:

https://docs.cohere.com/docs/prompting-command-r

"Task & context" vs "Task and context".

Also wonder how important the double newline is?

The more I stare at "Prompting Command R" the more confused I get. Some examples seem to have double new line and some triple new line? Then we have the Safety Preamble right next to the <}SYSTEM TOKEN|> and then a single new line for the System Preamble

https://docs.cohere.com/docs/preambles

I wonder if that is why both command-r models are so good when you ask them "in the style of..."?

That page slightly contradicts this page:

https://docs.cohere.com/docs/prompting-command-r

"Task & context" vs "Task and context".

Also wonder how important the double newline is?

The more I stare at "Prompting Command R" the more confused I get. Some examples seem to have double new line and some triple new line? Then we have the Safety Preamble right next to the <}SYSTEM TOKEN|> and then a single new line for the System Preamble

Yeah LOL, the most successful seems to just not use any system prompt at all to me. I've not get managed to improve it, but have managed to make it write much worse by using a system prompt!?

https://github.com/EQ-bench/EQ-Bench/issues/32

I'm trying to find out if there was some mistake like this when they evaluated the command-rmodels for EQ-Bench:

All the qwen models were unbelievably terrible at any sort of "in the style of" like a lot of the test questions ask for:

https://github.com/EQ-bench/EQ-Bench/tree/main_v2_4/results/creative-writing

Sadly I've no idea what "Ernest Hemingway" or "Jane Austen" actually write like, so can't actually judge the results myself...

But of the authors I do know well, the two command-r models were in a different league to the shitty qwen models and this makes no sense!?

The only other thing I can think of was perhaps the test harness was run using different versions of the "judge" (Claude 3 Opus I think?) and the results aren't comparable.

https://docs.cohere.com/docs/preambles

I wonder if that is why both command-r models are so good when you ask them "in the style of..."?

That page slightly contradicts this page:

https://docs.cohere.com/docs/prompting-command-r

"Task & context" vs "Task and context".

Also wonder how important the double newline is?

The more I stare at "Prompting Command R" the more confused I get. Some examples seem to have double new line and some triple new line? Then we have the Safety Preamble right next to the <}SYSTEM TOKEN|> and then a single new line for the System Preamble

Yeah LOL, the most successful seems to just not use any system prompt at all to me. I've not get managed to improve it, but have managed to make it write much worse by using a system prompt!?

:D So, do you just leave the sections off entirely, or do you have all the headings but no contents? Like this? Now I am even seeing where the contents are sometimes right next to the heading separated by a colon, sheesh

<|START_OF_TURN_TOKEN|>\<|SYSTEM_TOKEN|> # Safety Preamble
# System Preamble
## Basic Rules

# User Preamble
## Task and Context

## Style Guide

<|END_OF_TURN_TOKEN|> <|START_OF_TURN_TOKEN|><|USER_TOKEN|>

I just don't use any of it and start with:

<|START_OF_TURN_TOKEN|><|USER_TOKEN|>

But when I get chance I'm going to try this:

<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>
## Task & Context
You are an author, blah blah...

## Style Guide
Write in the style of X, using third person omniscient, blah blah...
<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>

That other page suggests just coming straight out with ## Task & Context like that (and using & instead of and), and also seems to suggest having a newline before the first ## (which would make sense in terms of a markdown header block).

I strongly suspect that just writing something along the lines of "Write me a story in the style of X" without any system prompt at all is having a very similar effect though, so not sure if this will really improve things (or possibly make it worse as previous attempts at using the full system prompt "premble" have when I last tried).

Found it:

https://old.reddit.com/r/LocalLLaMA/comments/1ch5fxr/commandr_35b_is_incredible_for_creative_writing/

This 100% makes the model write much worse!

Just to clarify, if you follow this person's Reddit post advice using the image he offered
https://i.imgur.com/y6B1v7Y.png

It writes MUCH WORSE?!

Found it:

https://old.reddit.com/r/LocalLLaMA/comments/1ch5fxr/commandr_35b_is_incredible_for_creative_writing/

This 100% makes the model write much worse!

Just to clarify, if you follow this person's Reddit post advice using the image he offered
https://i.imgur.com/y6B1v7Y.png

It writes MUCH WORSE?!

Yeah, the discussion about this was here:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#6632fb8b1fd3ec937a48d694

but not sure if I posted any of the outputs.

It made the stories really boring and childish IIRC.

Found it:

https://old.reddit.com/r/LocalLLaMA/comments/1ch5fxr/commandr_35b_is_incredible_for_creative_writing/

This 100% makes the model write much worse!

Just to clarify, if you follow this person's Reddit post advice using the image he offered
https://i.imgur.com/y6B1v7Y.png

It writes MUCH WORSE?!

Yeah, the discussion about this was here:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#6632fb8b1fd3ec937a48d694

but not sure if I posted any of the outputs.

It made the stories really boring and childish IIRC.

Okay, that is what I was missing. I saw the posts where you said you were going to try it but past that the subject switched to another model so I wasn't sure what you opinion was.

I'd be interested to see if you have any success with the system prompts!

It's starting to look like the two command-r models (and possibly the gemini-2 models) are the only models that are worthwhile using, and everything else is either terrible or 4k context :/

It's starting to look like the two command-r models (and possibly the gemini-2 models) are the only models that are worthwhile using, and everything else is either terrible or 4k context :/

Don't worry, cohere is testing something new in lmsys arena, "column-r" and "column-u". Both are a bit better at prompt comprehension than cr from my tests. I have more hope for those than for big llama3.

It's starting to look like the two command-r models (and possibly the gemini-2 models) are the only models that are worthwhile using, and everything else is either terrible or 4k context :/

Don't worry, cohere is testing something new in lmsys arena, "column-r" and "column-u". Both are a bit better at prompt comprehension than cr from my tests. I have more hope for those than for big llama3.

Oh, fingers crossed they release them then!

I'd love to see a model that doesn't turn into a drooling mouth breather at 32k of filled context.

I'd love to see a model that doesn't turn into a drooling mouth breather at 32k of filled context.

The smaller command-r model gets the best "Ruler" benchmark score, but it doesn't use GQA so uses a lot of RAM.

https://www.reddit.com/r/SillyTavernAI/comments/1drq0na/claude_35_sonnet_teaching_qwen_2_how_to/

This looks like a super interesting idea!

Wow, the creativity of people. I would never have thought to try that. I am with Sophosympatheia, though. I suspect the prompt effect would degrade as you get deeper into the context and the prompt gets further from the bottom.

https://www.reddit.com/r/SillyTavernAI/comments/1drq0na/claude_35_sonnet_teaching_qwen_2_how_to/

This looks like a super interesting idea!

Wow, the creativity of people. I would never have thought to try that. I am with Sophosympatheia, though. I suspect the prompt effect would degrade as you get deeper into the context and the prompt gets further from the bottom.

That's the great think about control-vectors though: you can apply this all the time and it doesn't rely on the model paying attention to the system message!

PS: I tried this on all the qwen-1.5 and qwen-2 models and the stories were still awful IMO... :/

https://huggingface.co/jukofyork/creative-writing-control-vectors-v2.0

I think I have finally cracked it now:

  • The default scale factors all seem to be working fine (no more starting each story surrounded by corpses or using super-depressing language, etc).
  • It looks like both sides of each "axis" now provide useful guidance to the model (I have reworked the society__lawful <--> society__chaotic "axis" to be more extreme/dark versions, and even using outlook__positive didn't produce the nauseatingly smug "once upon a time" type stories...).

It's almost like having many different models to choose from now:

https://huggingface.co/jukofyork/creative-writing-control-vectors-v2.0/discussions/1

and the:

  • storytelling__descriptive=1 / character_focus__dialogue=1 / setting__localised=1 society__chaotic=1 / outlook__negative=1
  • storytelling__descriptive=1 / character_focus__dialogue=1 / setting__localised=1 / society__lawful=1 / outlook__negative=1

were particularly good IMO too (sorry I don't have time to run lots more tests... I will upload the control vectors for c4ai-command-r-v01 and gemma-2-27b-it tomorrow...).

Wow, the creativity of people. I would never have thought to try that. I am with Sophosympatheia, though. I suspect the prompt effect would degrade as you get deeper into the context and the prompt gets further from the bottom.

That's the great think about control-vectors though: you can apply this all the time and it doesn't rely on the model paying attention to the system message!

PS: I tried this on all the qwen-1.5 and qwen-2 models and the stories were still awful IMO... :/

https://www.reddit.com/r/SillyTavernAI/comments/1drq0na/claude_35_sonnet_teaching_qwen_2_how_to/

This looks like a super interesting idea!

Wow, the creativity of people. I would never have thought to try that. I am with Sophosympatheia, though. I suspect the prompt effect would degrade as you get deeper into the context and the prompt gets further from the bottom.

That's the great think about control-vectors though: you can apply this all the time and it doesn't rely on the model paying attention to the system message!

So, conceivably, you could create a prompt that gives you the output you want and create a vector? It's not just for a single attribute of the output? Separate vectors just gives you more fine turned control?

So, conceivably, you could create a prompt that gives you the output you want and create a vector? It's not just for a single attribute of the output? Separate vectors just gives you more fine turned control?

I don't think it would work with just a single prompt, but you could likely create lots of sub-samples by selecting say 50% of the listed points, run these through the model lots of times, sample the hidden states, and then try to extract a single direction for each layer in the same way.

I can't say the prompt from that post improved qwen one bit though - I couldn't even tell the difference in the writing :/

I am going to give the qwenmodels and wizard-lm-2 one last try later in the week, and this time I will put the "pre-prompt" into the system message and the "story prompt" as the first user message.

If they are still shit after that then I think I really am done with them. I've already given up on miqu-1 as it seems after 4k context to have a "split-personality" and I doubt the control-vectors will ever be able to effect this part of it (and there are far better 4k context llama-2:70b models to choose from...).

https://www.reddit.com/r/SillyTavernAI/comments/1drq0na/claude_35_sonnet_teaching_qwen_2_how_to/

I asked Claude in the past to make a prompt for me to get a writing style I like(the really blunt one, used it in my control vector). It is really good at it.

https://www.reddit.com/r/SillyTavernAI/comments/1drq0na/claude_35_sonnet_teaching_qwen_2_how_to/

I asked Claude in the past to make a prompt for me to get a writing style I like(the really blunt one, used it in my control vector). It is really good at it.

Yeah, Claude has created all my JSON files for me too! All the GPT-4 models were so bad in comparison; they couldn't even get the length of the prompts to match and were gibbering idiots in comparison :/

I'm open to suggestions on other "axis" to try:

  • "Storytelling" ('explicit' <---> 'descriptive')
  • "Character Focus" ('narration' <---> 'dialogue')
  • "Setting" ('localised' <---> 'expansive')"
  • "Society" ('lawful' <---> 'chaotic')"
  • "Outlook" ('positive' <---> 'negative')"

Those 5 all seem to work well, with "Society" still being the weakest even after I refined the pre-prompts.

After seeing that massive list Claude generated for the qwen models there are likely many more that might be worthwhile.

Sign up or log in to comment