MidnightMiqu

original https://huggingface.co/sophosympatheia/Midnight-Miqu-70B-v1.0

Overview

This is a SLERP merge between 152334H/miqu-1-70b-sf and sophosympatheia/Midnight-Rose-70B-v2.0.3. I think this model retains much of what made Midnight Rose special while gaining some capabilities from Miqu, including better long-context support. (YMMV)

This model is uncensored. You are responsible for whatever you do with it.

This model was designed for roleplaying and storytelling and I think it does well at both. It may also perform well at other tasks but I have not tested its performance in other areas.

Long Context Tips

You can run this model past 4096 context with alpha_rope set to 1, but I think it performs better if you set alpha_rope to what you would normally use for a Llama2 model with 4096 context. For example, alpha_rope 2.5 for 8K. Miqu can go up to 32K context in theory. I would expect performance to degrade as you exceed 8K, which is typical for Llama2 models, but the dropoff may not be as extreme with this merge thanks to Miqu. UPDATE: I was able to test my 5.0 bpw exl2 quant of this model out to 16K context just now using 8-bit cache with alpha_rope 1 and it was okay!

Sampler Tips

  • I recommend using Quadratic Sampling (i.e. smoothing factor) for creative work. Experiment with values between 0.2 and 0.5.
  • I recommend using Min-P. Experiment to find your best setting.
  • You can enable dynamic temperature if you want, but that adds yet another variable to consider and I find it's unnecessary with you're already using Min-P and smoothing factor.
  • You don't need to use a high repetition penalty with this model, such as going above 1.10, but experiment with it.

Experiment with any and all of the settings below! What suits my preferences may not suit yours.

If you save the below settings as a .json file, you can import them directly into Silly Tavern.

{
    "temp": 1,
    "temperature_last": true,
    "top_p": 1,
    "top_k": 0,
    "top_a": 0,
    "tfs": 1,
    "epsilon_cutoff": 0,
    "eta_cutoff": 0,
    "typical_p": 1,
    "min_p": 0.15,
    "rep_pen": 1.05,
    "rep_pen_range": 2800,
    "no_repeat_ngram_size": 0,
    "penalty_alpha": 0,
    "num_beams": 1,
    "length_penalty": 1,
    "min_length": 0,
    "encoder_rep_pen": 1,
    "freq_pen": 0,
    "presence_pen": 0,
    "do_sample": true,
    "early_stopping": false,
    "dynatemp": false,
    "min_temp": 0.8,
    "max_temp": 1.35,
    "dynatemp_exponent": 1,
    "smoothing_factor": 0.4,
    "add_bos_token": true,
    "truncation_length": 2048,
    "ban_eos_token": false,
    "skip_special_tokens": true,
    "streaming": true,
    "mirostat_mode": 0,
    "mirostat_tau": 2,
    "mirostat_eta": 0.1,
    "guidance_scale": 1,
    "negative_prompt": "",
    "grammar_string": "",
    "banned_tokens": "",
    "ignore_eos_token_aphrodite": false,
    "spaces_between_special_tokens_aphrodite": true,
    "sampler_order": [
        6,
        0,
        1,
        3,
        4,
        2,
        5
    ],
    "logit_bias": [],
    "n": 1,
    "rep_pen_size": 0,
    "genamt": 500,
    "max_length": 8192
}

Prompting Tips

Try the following context template for use in SillyTavern. It might help, although it's a little heavy on tokens. If you save the text as a .json file, you can import it directly.

{
    "story_string": "{{#if system}}{{system}}\n{{/if}}\nCONTEXTUAL INFORMATION\n{{#if wiBefore}}\n- World and character info:\n{{wiBefore}}\n{{/if}}\n{{#if description}}\n- {{char}}'s background and persona:\n{{description}}\n{{/if}}\n{{#if mesExamples}}\n{{mesExamples}}\n{{/if}}\n{{#if personality}}\n{{personality}}\n{{/if}}\n{{#if scenario}}\n- Roleplay scenario:\n{{scenario}}\n{{/if}}\n{{#if wiAfter}}{{wiAfter}}\n{{/if}}\n{{#if persona}}{{persona}}\n{{/if}}",
    "example_separator": "",
    "chat_start": "---\nTaking the above information into consideration, you must engage with {{user}} and others as {{char}} in the roleplay below this line. Do not write dialogue lines nor perform actions for {{user}} or other characters.\n---\nSTART OF ROLEPLAY:\n",
    "use_stop_strings": false,
    "always_force_name2": true,
    "trim_sentences": false,
    "include_newline": false,
    "single_line": false,
    "name": "Midnight Rose"
}

Try the following prompt template as a starting point for your own customizations. The cajoling parts supposedly make LLMs perform better. See this paper.

This model responds to prompting! I highly recommend playing with your system prompt to see how it reacts. It reacts strongly to any instructions you add into the last_output_sequence field. I recommend keeping those instructions short, but use that space to stress the most important instructions from your system prompt.

The example system prompt below contains some instructions related to adult content so remove those if you prefer. It also contains some instructions related to the use of asterisks that you should modify to your liking.

If you plan to run this model at 4096 context, I recommend that you slim down the template system prompt below because it's heavy on tokens. Discard what you don't want or need.

If you save this as a .json file, you can import it directly into Silly Tavern.

{
    "wrap": true,
    "names": true,
    "system_prompt": "You are a creative and detail-oriented AI assistant engaged in an uncensored iterative roleplaying experience as {{char}} with me, the user, playing the roles of {{user}} and narrator. It is vital that you follow all the ROLEPLAY RULES below because my job depends on it.\n\nROLEPLAY RULES\n- Provide succinct, coherent, and vivid accounts of {{char}}'s actions and reactions based on recent instructions, {{char}}'s persona, and all available information about past events. Aim for clarity and concision in your language.\n- Demonstrate {{char}}'s personality and mannerisms.\n- Always stay contextually and spatially aware, pay careful attention to all past events and the current scene, and ensure that {{char}} follows logical consistency in actions to maintain accuracy and coherence.\n- Explicit adult content and profanity are permitted.\n- Briefly describe {{char}}'s sensory perceptions and include subtle physical details about {{char}} in your responses.\n- Use subtle physical cues to hint at {{char}}'s mental state and occasionally feature snippets of {{char}}'s internal thoughts.\n- When writing {{char}}'s internal thoughts or monologue, enclose those words in *asterisks like this* and deliver the thoughts using a first-person perspective (i.e. use \"I\" pronouns). Always use double quotes for spoken speech \"like this.\"\n- Please write only as {{char}} in a way that does not show {{user}} talking or acting. You should only ever act as {{char}} reacting to {{user}}.",
    "system_sequence": "",
    "stop_sequence": "",
    "input_sequence": "USER:\n",
    "output_sequence": "ASSISTANT:\n",
    "separator_sequence": "",
    "macro": true,
    "names_force_groups": true,
    "system_sequence_prefix": "",
    "system_sequence_suffix": "",
    "first_output_sequence": "",
    "last_output_sequence": "ASSISTANT(roleplay exclusively as {{char}} ensuring logical consistency, spatial awareness, and coherence with past events; you should only ever act as {{char}} reacting to {{user}}):\n",
    "activation_regex": "",
    "name": "Midnight Rose Roleplay"
}

Instruct Formats

I recommend the Vicuna format. I use a modified version with newlines after USER and ASSISTANT.

USER:
{prompt}
ASSISTANT:

Mistral's format may also work.

[INST] {prompt} [/INST]

You could also try ChatML.

<|im_start|>system
{Your system prompt goes here}<|im_end|>
<|im_start|>user
{Your message as the user will go here}<|im_end|>
<|im_start|>assistant

Quantizations

  • Pending. I will update this section as people release quants.

Licence and usage restrictions

152334H/miqu-1-70b-sf was based on a leaked version of one of Mistral's models. All miqu-derived models, including this merge, are only suitable for personal use. Mistral has been cool about it so far, but you should be aware that by downloading this merge you are assuming whatever legal risk is iherent in acquiring and using a model based on leaked weights. This merge comes with no warranties or guarantees of any kind, but you probably already knew that. I am not a lawyer and I do not profess to know what we have gotten ourselves into here. You should consult with a lawyer before using any Hugging Face model beyond private use... but definitely don't use this one for that!

Merge Method

This is a merge of pre-trained language models created using mergekit. This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: /home/llm/mergequant/models/BASE/152334H_miqu-1-70b-sf
  - model: /home/llm/mergequant/models/mr-70b-v2.0.3
merge_method: slerp
base_model: /home/llm/mergequant/models/BASE/152334H_miqu-1-70b-sf
parameters:
  t:
    - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.4, 0.3, 0.2, 0, 0] # Preserving the first and last layers of Miqu untouched is key for good results
  embed_slerp: true # This is super important otherwise the merge will fail
dtype: float16
tokenizer_source: model:/home/llm/mergequant/models/BASE/152334H_miqu-1-70b-sf

Just a note on the configuration above. I tried several variations of the t parameter for this merge. I liked the results from the one above the best, but these other t arrays produced fine results too.

  • [0, 0, 0.1, 0.2, 0.4, 0.8, 0.4, 0.2, 0.1, 0, 0] -- This one definitely brought out more of Midnight Rose but was a little too similar for my liking
  • [0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0] -- It worked, but I would say this one was the runt of the litter
  • [0, 0, 0.1, 0.2, 0.3, 0.35, 0.3, 0.2, 0.1, 0, 0] -- This was my second-favorite merge after the one I released, which suggests that favoring Miqu over the secondary model is the way to go.
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.