Edit model card

Ashhwriter-Mistral-7B (v1)

A text completion model trained on about 315 MB of relatively narrow-focused amateur erotica broken into chunks of about 8k tokens length, based on Mistral-7B-0.1.

Please note that this is not a chat model nor an instruct model and is even less suitable for chatting than regular base models! A notable issue is that the model can get confused with consecutive or multi-paragaphs dialogue lines from the same character when chatting. This is most likely due to the strong storywriting bias that the model has as a result of the relatively large amount of data it was trained on.

Further finetuning will be needed for making the model suitable for chat purposes.

Available versions

Uses

The model can be used as-is, or merged with other ones (preferably using the provided LoRA adapter) to give them a very strong NSFW and potentially shocking bias. It could also be used as a finetuning base for chat models.

It has no specific prompting format as it's been trained in a form close to true unsupervised finetuning.

Some prompting tips for text autocompletion

  • In KoboldAI/KoboldLite, using <s> (BOS token) as the first token in the context window may improve generation quality. This does not work with the GGUF version.
  • The separator ----- may trigger the model into completing text in a style closer to that used during training.
  • It appears that, at least when testing in text-generation-webui, leaving an empty line at the beginning of the context can make the outputs more aligned to the finetune.
  • Enclosing dialogues in quotation marks makes the model draw more from its storywriting background.

Unusually effective prompting templates

Although the model hasn't been trained on instructions or chat data, formatting questions as a signed letter can work well in getting a long reasoned response.

{your question here}

{signature}
-----

{model response}

Using a Question/Answer format may also work. It can be beneficial to add indications regarding the style.

Q: {your question here}. Please be detailed and informative.

A: {model response}

Bias, Risks, and Limitations

Bias and Risks

From practical observations, the model will very easily produce offensive, harmful and even dangerous hallucinations of any kind, surprisingly not limiting to what was included in the source dataset.

The source dataset includes fictional content that many could find shocking and disturbing. The model outputs will often reflect this bias.

Limitations

The model was not designed for the following uses:

  • Question answering
  • Instruction following
  • Adventure mode
  • Chatting / Roleplaying

Recommendations

Due to the nature of the content it can generate, it is strongly advised not to use the model in public-facing environments, where a general audience is expected, or by minors.

Training Details

Training Data

The data used for training is available from public sources and will not be provided on or via Huggingface. In total, about 315 MB of data have been used.

A basic cleaning procedure was performed for normalizing punctuation (at least to some extent) and converting HTML tags when present, but no particular effort was made to remove story milestones, trailers and acknowlegments when they were not clearly separated from the story. In some cases chapter titles have been surrounded by === or =====, but not all of them have this format.

Excessively short (500 tokens or less) stories or story chunks have been rejected, and a very basic check was performed to reject most non-English stories as well.

Training procedure

Axolotl was used for training on 2x NVidia A40 GPUs.

The A40 GPUs have been graciously provided by Arc Compute.

Training hyperparameters

A lower learning rate than usual seemed necessary for this dataset.

Noisy embeddings (NEFT) were used for training; actual impact unknown.

  • learning_rate: 0.000055
  • lr_scheduler: constant_with_warmup
  • noisy_embedding_alpha: 5
  • num_epochs: 2
  • sequence_len: 8192
  • lora_r: 256
  • lora_alpha: 16
  • lora_dropout: 0.0
  • lora_target_linear: True
  • bf16: True
  • fp16: false
  • tf32: True
  • load_in_8bit: True
  • adapter: lora
  • micro_batch_size: 2
  • gradient_accumulation_steps: 1
  • warmup_steps: 100
  • optimizer: adamw_bnb_8bit
  • flash_attention: true
  • sample_packing: true
  • pad_to_sequence_len: true

Using 2 GPUs, the effective global batch size would have been 4.

Loss graphs

Since the model has been trained with entire samples, loss values are higher than typical for instruct models or chat models trained on just the model response.

Training loss

Train

Eval loss

Eval

Downloads last month
40
GGUF
Model size
7.24B params
Architecture
llama

Collection including lemonilia/Ashhwriter-Mistral-7B