tldr; This is Phi 3 Medium finetuned for (mainly SFW) roleplaying. It was a promising release candidate that fell flat when things got moist. I'm publishing all the details for anyone else interested in finetuning Phi 3. Training Details: - 8x H100 80GB SXM GPUs - 1 hour training time Results for Roleplay Mode (i.e., not Instruct format): - Strong RP formatting. - Tends to output short, straightforward replies to the player character. - Starts to break down when things get moist. - Important: My testing is lazy and flawed. Take it with a grain of salt and test the GGUFs before taking notes. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65f2fd1c25b848bd061b5c2e/WxdrI9dDHm4nuFHAe8FzZ.png) Axolotl Config (some fields omitted) ```yaml base_model: failspy/Phi-3-medium-4k-instruct-abliterated-v3 load_in_4bit: true bf16: auto fp16: tf32: false flash_attention: true sequence_len: 4096 datasets: - path: Undi95/andrijdavid_roleplay-conversation-sharegpt type: customphi3 num_epochs: 2 warmup_steps: 30 weight_decay: 0.1 adapter: lora lora_r: 128 lora_alpha: 16 lora_dropout: 0.1 lora_target_linear: true gradient_accumulation_steps: 2 micro_batch_size: 2 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true sample_packing: true pad_to_sequence_len: true optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 0.0001 max_grad_norm: 1.0 val_set_size: 0.01 evals_per_epoch: 3 eval_max_new_tokens: 128 eval_batch_size: 1 ```