Edit model card
Configuration Parsing Warning: In config.json: "quantization_config.load_in_4bit" must be a boolean

Limamono-7B (Mistral) v0.50

This is an early version (50% completed) of a strongly NSFW roleplaying model trained with extremely limited amounts of almost entirely synthetic data of hopefully higher quality than typical human conversations. The intended target audience is straight men and lesbians.

Limamono tries to address the main issues and limitations of the previously released LimaRP and is composed of extensively modified conversations written with the help of base Yi-34 model by 01.ai.

A defining characteristic of Limamono is mind reading. Characters may (not necessarily always) include thoughts in a seamless fashion inside their utterances.

The prose style of this model is a somewhat extended book/novel format (further detailed below). Other formats are not supported and may conflict with the special features of this model.

Note: there is currently no plan to release the dataset.

Known issues and quirks

  • The model may feel somewhat "overbaked". Use a temperature of 1.
  • Characters may occasionally exhibit strange (unintended) speech quirks. Please report if found.
  • The model will often hallucinate facts when generating character cards in text completion mode from an empty context.
  • Impersonation may sometimes occur early in the chat, in particular when trying to force a very long character message length or regenerating the greeting message.

Prompt format

Limamono uses a slight variation of the extended Alpaca format, with ### Input: immediately preceding user inputs and ### Response: immediately preceding model outputs. It's been trained with a fixed "trigger phrase" similar to that of the original Alpaca, just before the ### Instruction: sequence, following the template below.

Below is an instruction that describes background information for a story-rich chat. Write an appropriate response for both the instruction and user input.

### Instruction:
{{char}}
{{description}}

Scenario: {{scenario}}

### Response:
{{char}}: [utterance]

### Input:
{{user}}: [utterance]

### Response:
{{char}}: [utterance]

[etc...]

More in detail, the instruction should preferably include a moderately long (a few hundred tokens long) character description made in the style of the various fandom wikis on the Internet, with the character name as the first line.

You can refer to the included Charlotte model card for an example on how character descriptions can be formatted (important note: the provided SillyTavern story context settings must also be used at the same time), but another option would be taking a hint from the semiempty model output in text-generation-webui or other text completion UIs (you will likely need to add the trigger phrase for the model to generate text as intended from scratch); the model will generally output wiki-style character sheets in this way. Changing details at the beginning of the sheet will affect the rest of the generation. There's no fixed format for it, but the training data generally follows a pattern similar to this example:

{{char}}
Attribute name 1: brief text
Attribute name 2: brief text
Attribute name n: brief text

Description paragraph 1

Description paragraph 2

Description paragraph n

- Trivia and misc info 1
- Trivia and misc info 2
- Trivia and misc info n

Scenario: {{scenario}}

Although the number of attributes, paragraphs and trivia may vary, it is strongly advised to always include a Scenario at the end of it for guiding the character behavior at the beginning of the chat.

Message length control

Inspired by the previously-named "Roleplay" preset in SillyTavern, like with LimaRP it is possible to append a length modifier to the instruction sequences in this way. Note that the length modifier should be placed with a space after the colon:

### Response: (length = long)
{{char}}: [utterance]

### Input: (length = tiny)
{{user}}: [utterance]

This has an effect on bot responses, but as of now it might not always reliably work. The lengths used during training are: micro, tiny, short, medium, long, massive, huge.

From extended testing, a long length was found to work reasonably well. In the training data, bot messages are usually long, massive and huge, with the largest size generally only for the greeting messages.

It is also suggested to add (length = tiny) or (length = short) to the ### Input: sequence, in order to help the model follow more closely its training data.

Prose style

Only the Novel/Forum RP prose style is supported, meaning that narration should always be in third person and past tense, and that dialogue lines should always be wrapped with quotation marks.

Style details

  • Narration does not have any delimiter.
    • Jessica looked at Mark with disdain.
  • Dialogues wrapped with with ASCII double quotation marks. Fancy quotes are not supported.
    • "I say this."
  • Onomatopoeias are wrapped with asterisks.
    • *thud*
  • Character thoughts are wrapped with underscores. This may often spontaneously occur with Limamono.
    • _What is he doing?_
  • Non-dialogue quotes are wrapped with two apostrophes on each side. This avoids conflicts with quotation marks in SillyTavern.
    • ''The Jungle Book''
  • Punctuation has been normalized and tries to follow standard conventions in book/novel writing.

SillyTavern settings

Try to follow these settings. Appropriate files for replicating them are included in the model repository:

ST settings

Example

This is how a typical RP chat may take place with this model. Notice the presence of character thoughts. These may not always be present, but once generated they will appear more frequently.

example

You can try chatting with Charlotte by downloading her SillyTavern character card in the repository.

Text generation settings

For testing I use these settings:

  • Temperature: 1.0
  • Tail-Free Sampling: 0.85
  • Repetition Penalty: 1.11
  • Repetition Penalty range: 2048
  • Top-p: 1 (disabled), Top-k: 0 (disabled)

Training procedure

Axolotl was used for training on one NVidia RTX3090.

The training data consisted of 50 conversations (199k tokens / 1117 messages) of roughly 4k tokens length. The learning rate is the one that about minimizes the eval loss on one epoch with a constant learning schedule. For the following two epochs what would be normally considered overfitting occurs, but at the same time output quality also improves.

Training hyperparameters

  • load_in_8bit: true
  • adapter: lora
  • sequence_len: 4096
  • sample_packing: false
  • pad_to_sequence_len: true
  • lora_r: 8
  • lora_alpha: 16
  • lora_dropout: 0.5
  • gradient_accumulation_steps: 1
  • micro_batch_size: 1
  • num_epochs: 3
  • optimizer: adamw_torch
  • lr_scheduler: cosine
  • learning_rate: 0.0002
  • weight_decay: 0.1
  • train_on_inputs: false
  • group_by_length: false
  • bf16: true
  • fp16: false
  • tf32: true

Train loss graph

Train loss

Downloads last month
146
GGUF
Model size
7.24B params
Architecture
llama

Collection including lemonilia/Limamono-Mistral-7B-v0.50