lemonilia
/

Limamono-Mistral-7B-v0.50

@@ -7,8 +7,8 @@ tags:
 - not-for-all-audiences
 ---
-# Limamono-7B (Mistral) v0.3
-This is a **very early version** (30% completed) of a strongly NSFW roleplaying model trained with
 _extremely limited_ amounts of almost entirely synthetic data of hopefully higher quality than typical
 human conversations. The intended model audience is straight men and lesbians.
@@ -26,9 +26,9 @@ Other formats are not supported and may conflict with the special features of th
 ## Known issues and quirks
 - The model may feel somewhat "overbaked".
-- Characters may occasionally exhibit strange (unintended) speech quirks.
-- Impersonation may still sometimes occur early in the chat, in particular when trying to force a long
-  character message length.
 ## Prompt format
 Limamono uses a slight variation of the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
@@ -58,8 +58,8 @@ Scenario: {{scenario}}
 ```
 More in detail, the instruction should _preferably_ include a moderately long (a few hundred tokens
-long) character description made in the style of the various fandom wikis on the Internet, with the
-character name as the first line.
 You can refer to the included [Charlotte model card](https://huggingface.co/lemonilia/Limamono-Mistral-7B-v0.3/blob/main/Charlotte.png)
 for an example on how characer descriptions can be formatted, but another option would be taking a
@@ -108,10 +108,9 @@ should be placed with a space _after_ the colon:
 This has an effect on bot responses, but as of now it might not always reliably work. The lengths
 used during training are: `micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`.
-However, it is recommended **not** to use a length modifier at all for character responses, except as a
-_temporary measure_ for increasing message length in case they become too short.
-On the other hand, it is suggested to add `(length = tiny)` or `(length = short)` to the
 `### Input:` sequence, in order to help the model follow more closely its training data.
 ## Prose style
@@ -151,8 +150,11 @@ in the repository.
 [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
 on one NVidia RTX3090.
-The training data consisted of **33** conversations (131k tokens / 740 messages)
-of roughly 4k tokens length.
 ### Training hyperparameters
 - load_in_8bit: true
@@ -168,10 +170,17 @@ of roughly 4k tokens length.
 - num_epochs: 3
 - optimizer: adamw_torch
 - lr_scheduler: constant
-- learning_rate: 0.00018
 - weight_decay: 0.1
 - train_on_inputs: false
 - group_by_length: false
 - bf16: true
 - fp16: false
-- tf32: true

 - not-for-all-audiences
 ---
+# Limamono-7B (Mistral) v0.43
+This is a **very early version** (43% completed) of a strongly NSFW roleplaying model trained with
 _extremely limited_ amounts of almost entirely synthetic data of hopefully higher quality than typical
 human conversations. The intended model audience is straight men and lesbians.
 ## Known issues and quirks
 - The model may feel somewhat "overbaked".
+- Characters may occasionally exhibit strange (unintended) speech quirks. Please report if found.
+- Impersonation may sometimes occur early in the chat, in particular when trying to force a very
+  long character message length or regenerating the greeting message.
 ## Prompt format
 Limamono uses a slight variation of the [extended Alpaca format](https://github.com/tatsu-lab/stanford_alpaca),
 ```
 More in detail, the instruction should _preferably_ include a moderately long (a few hundred tokens
+long) character description made in the style of the various fandom wikis on the Internet, **with the
+character name as the first line**.
 You can refer to the included [Charlotte model card](https://huggingface.co/lemonilia/Limamono-Mistral-7B-v0.3/blob/main/Charlotte.png)
 for an example on how characer descriptions can be formatted, but another option would be taking a
 This has an effect on bot responses, but as of now it might not always reliably work. The lengths
 used during training are: `micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`.
+From extended testing, a **long** length was found to work reasonably well.
+It is also suggested to add `(length = tiny)` or `(length = short)` to the
 `### Input:` sequence, in order to help the model follow more closely its training data.
 ## Prose style
 [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
 on one NVidia RTX3090.
+The training data consisted of **43** conversations (171k tokens / 763 messages)
+of roughly 4k tokens length. The learning rate is the one that about minimizes the
+eval loss on one epoch with a constant learning schedule. For the following two epochs
+what would be normally considered overfitting occurs, but at the same time output
+quality also improves.
 ### Training hyperparameters
 - load_in_8bit: true
 - num_epochs: 3
 - optimizer: adamw_torch
 - lr_scheduler: constant
+- learning_rate: 0.0002
 - weight_decay: 0.1
 - train_on_inputs: false
 - group_by_length: false
 - bf16: true
 - fp16: false
+- tf32: true
+### Train loss graph
+This one was obtained by experimentally repeating the data 3 times and finetuning for 1 epoch,
+with similar end results but a smoother graph without sudden jumps compared to finetuning
+unique data for 3 epochs.
+![Train loss](https://files.catbox.moe/hiu9ah.png)