Not-For-All-Audiences

text-generation-inference

Inference Endpoints

8-bit precision

bitsandbytes

Model card Files Files and versions Community

Limamono-Mistral-7B-v0.50 / README.md

lemonilia

Update README.md

292c314 verified 5 months ago

preview code

raw

history blame contribute delete

No virus

7.55 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- not-for-all-audiences
	---

	# Limamono-7B (Mistral) v0.50
	This is an early version (50% completed) of a strongly NSFW roleplaying model trained with
	_extremely limited_ amounts of almost entirely synthetic data of hopefully higher quality than typical
	human conversations. The intended target audience is straight men and lesbians.

	Limamono tries to address the main issues and limitations of the previously released [LimaRP](https://huggingface.co/datasets/lemonilia/LimaRP)
	and is composed of extensively modified conversations written with the help of base [Yi-34](https://huggingface.co/01-ai/Yi-34B)
	model by 01.ai.

	A defining characteristic of Limamono is _mind reading_. Characters may (not necessarily always)
	include thoughts in a seamless fashion inside their utterances.

	The prose style of this model is a somewhat extended book/novel format (further detailed below).
	Other formats are not supported and may conflict with the special features of this model.

	Note: there is currently no plan to release the dataset.

	## Known issues and quirks
	- The model may feel somewhat "overbaked". Use a temperature of 1.
	- Characters may occasionally exhibit strange (unintended) speech quirks. Please report if found.
	- The model will often hallucinate facts when generating character cards in text completion mode
	from an empty context.
	- Impersonation may sometimes occur early in the chat, in particular when trying to force a very
	long character message length or regenerating the greeting message.

	## Prompt format
	Limamono uses a slight variation of the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
	with `### Input:` immediately preceding user inputs and `### Response:` immediately preceding
	model outputs. It's been trained with a fixed "trigger phrase" similar to that of the original
	Alpaca, just before the `### Instruction:` sequence, following the template below.

	```
	Below is an instruction that describes background information for a story-rich chat. Write an appropriate response for both the instruction and user input.

	### Instruction:
	{{char}}
	{{description}}

	Scenario: {{scenario}}

	### Response:
	{{char}}: [utterance]

	### Input:
	{{user}}: [utterance]

	### Response:
	{{char}}: [utterance]

	[etc...]
	```

	More in detail, the instruction should _preferably_ include a moderately long (a few hundred tokens
	long) character description made in the style of the various fandom wikis on the Internet, **with the
	character name as the first line**.

	You can refer to the included [Charlotte model card](https://huggingface.co/lemonilia/Limamono-Mistral-7B-v0.3/blob/main/Charlotte.png)
	for an example on how character descriptions can be formatted (important note: the provided SillyTavern
	story context settings must also be used at the same time), but another option would be taking a
	hint from the semiempty model output in `text-generation-webui` or other text completion UIs (you will
	likely need to add the _trigger phrase_ for the model to generate text as intended from scratch);
	the model will generally output wiki-style character sheets in this way. Changing details at the
	beginning of the sheet will affect the rest of the generation. There's no fixed format for it, but the
	training data generally follows a pattern similar to this example:

	```
	{{char}}
	Attribute name 1: brief text
	Attribute name 2: brief text
	Attribute name n: brief text

	Description paragraph 1

	Description paragraph 2

	Description paragraph n

	- Trivia and misc info 1
	- Trivia and misc info 2
	- Trivia and misc info n

	Scenario: {{scenario}}
	```

	Although the number of attributes, paragraphs and trivia may vary, it is strongly advised
	to always include a Scenario at the end of it for guiding the character behavior at the beginning
	of the chat.

	### Message length control
	Inspired by the previously-named "Roleplay" preset in SillyTavern, like with LimaRP it is possible to
	append a length modifier to the instruction sequences in this way. Note that the length modifier
	should be placed with a space _after_ the colon:

	```
	### Response: (length = long)
	{{char}}: [utterance]

	### Input: (length = tiny)
	{{user}}: [utterance]
	```

	This has an effect on bot responses, but as of now it might not always reliably work. The lengths
	used during training are: `micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`.

	From extended testing, a long length was found to work reasonably well. In the training data,
	bot messages are usually `long`, `massive` and `huge`, with the largest size generally only for
	the greeting messages.

	It is also suggested to add `(length = tiny)` or `(length = short)` to the
	`### Input:` sequence, in order to help the model follow more closely its training data.

	## Prose style
	Only the Novel/Forum RP prose style is supported, meaning that narration should always be in third
	person and past tense, and that dialogue lines should always be wrapped with quotation marks.

	### Style details
	- Narration does not have any delimiter.
	- `Jessica looked at Mark with disdain.`
	- Dialogues wrapped with with ASCII double quotation marks. Fancy quotes are not supported.
	- `"I say this."`
	- Onomatopoeias are wrapped with asterisks.
	- `thud`
	- Character thoughts are wrapped with underscores. This may often spontaneously occur with Limamono.
	- `_What is he doing?_`
	- Non-dialogue quotes are wrapped with two apostrophes on each side. This avoids conflicts with quotation marks in SillyTavern.
	- `''The Jungle Book''`
	- Punctuation has been normalized and tries to follow standard conventions in book/novel writing.

	## SillyTavern settings
	Try to follow these settings. Appropriate files for replicating them are included in the model
	repository:

	![ST settings](https://files.catbox.moe/nsfaxe.png)

	## Example
	This is how a typical RP chat may take place with this model. Notice the presence of
	character thoughts. These may not always be present, but once generated they will
	appear more frequently.

	![example](https://files.catbox.moe/ch2bo2.png)

	You can try chatting with Charlotte by downloading her [SillyTavern character card](https://huggingface.co/lemonilia/Limamono-Mistral-7B-v0.3/blob/main/Charlotte.png)
	in the repository.

	## Text generation settings
	For testing I use these settings:
	- Temperature: 1.0
	- Tail-Free Sampling: 0.85
	- Repetition Penalty: 1.11
	- Repetition Penalty range: 2048
	- Top-p: 1 (disabled), Top-k: 0 (disabled)

	## Training procedure
	[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
	on one NVidia RTX3090.

	The training data consisted of 50 conversations (199k tokens / 1117 messages)
	of roughly 4k tokens length. The learning rate is the one that about minimizes the
	eval loss on one epoch with a constant learning schedule. For the following two epochs
	what would be normally considered overfitting occurs, but at the same time output
	quality also improves.

	### Training hyperparameters
	- load_in_8bit: true
	- adapter: lora
	- sequence_len: 4096
	- sample_packing: false
	- pad_to_sequence_len: true
	- lora_r: 8
	- lora_alpha: 16
	- lora_dropout: 0.5
	- gradient_accumulation_steps: 1
	- micro_batch_size: 1
	- num_epochs: 3
	- optimizer: adamw_torch
	- lr_scheduler: cosine
	- learning_rate: 0.0002
	- weight_decay: 0.1
	- train_on_inputs: false
	- group_by_length: false
	- bf16: true
	- fp16: false
	- tf32: true

	### Train loss graph
	![Train loss](https://files.catbox.moe/dg4qww.png)