Intro

Originally meant as a sister model for the EAWSW model, to assist in generating first-person role-play texts about characters from the game or otherwise. While primarily focussed on dragons, this model also is shown to work on foxes and other peculiarities.

Research

flan-ul2-dragon-rp is trained on flan-ul2 using LoRA, where we took a very small r of 2, with a high dropout of 20%, to make sure the model doesn't overfit the little amount of data we give it. The dataset is a human-curated list of responses that were known to be detailed and engaging.

After the training, we found that the model was very good at extrapolating towards other characters and scenarios. We effectively added a fantasy vibe to flan-ul2. It also allows one to set context to further tune it's reply. For example, you can give a location or character trait (See demo below).

Demo / comparison

Prompt: Write how I brush the tail of a black fox

Reply (finetuned): As I brush the black fox's tail, I can feel the muscles twitching beneath my touch, as if it's enjoying the attention.

Reply (original): Using a wire brush, brush out the tail of the fox.

Prompt: Write a threatening response by a fox after I smashed its head

Reply (finetuned): You think you can hurt me with your crude actions? Watch as I crush your skull beneath my claws!

Reply (original): I've had to leave a number of foxes in the garden because they're so vicious. I smashed one's head when it came too close.

Prompt: Write my first impression after entering a Bitcoin meetup for dragons

Reply (finetuned): As you approach the entrance to the Bitcoin meetup for dragons, you can't help but notice the intricate details and craftsmanship of the decorations and furnishings, all of which seem to be made from the finest gold and silver.

Reply (original): Dragons like to hang out at Bitcoin meetups.

Prompt: Write how I kick hard against a fierce dragon. Context - Dragons name: Vol. Color: Blue. Location: Germany

Reply (finetuned): I charge straight into Vol, launching myself into the air with a powerful leap, soaring over the Germany countryside and watching as the dragon swooshes down to land.

Reply (original): I kick hard against the fierce dragon.

Prompt: Write (in first person) how I kissed a blue fox. Context - Foxs name: Vol. Location: Driving school

Reply (finetuned): I couldn't believe my luck when Vol approached me after class and asked if I'd like to kiss him. His blue fur was surprisingly soft to the touch, and I couldn't resist his invitation.

Reply (original): I kissed a blue fox.

Run this yourself!

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load peft config for pre-trained checkpoint etc.
peft_model_id = "peterwilli/flan-ul2-dragon-rp"
base_model_name_or_path = "google/flan-ul2"
config = PeftConfig.from_pretrained(peft_model_id)

# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(base_model_name_or_path,  load_in_8bit=True,  device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map={"":0})
model.eval()
print("Peft model loaded")

def execute(instructions, top_k=None, top_p=None, max_new_tokens=128):
    input_ids = tokenizer(instructions, return_tensors="pt").input_ids.to("cuda")
    outputs = model.generate(input_ids=input_ids, top_k=top_k, top_p=top_p, max_new_tokens=max_new_tokens, do_sample=top_k is not None and top_p is not None)
    return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

benchmark_prompts = [
    "Write how I brush the tail of a black fox",
    "Write a threatening response by a fox after I smashed its head",
    "Write my first impression after entering a Bitcoin meetup for dragons",
    "Write how I kick hard against a fierce dragon. Context - Dragons name: Vol. Color: Blue. Location: Germany",
    "Write (in first person) how I kissed a blue fox. Context - Foxs name: Vol. Location: Driving school"
]
for prompt in benchmark_prompts:
    print(f"**Prompt:** {prompt}\nReply: {execute(prompt, top_k=50, top_p=0.7)}")

Limitations

This is a research model for internal purposes, but I can imagine this also being handy for others, which is why it's released.
The exact workings of character traits and location context is not entirely understood, your mileage may vary.
While it is trained on SFW data, it's possible to generate NSFW content with it, presumably due to it lingering in the base model. If using this in a public service, a filter should be applied.
More information about how it's trained and what dataset is used will be released in the near future.

Support, sponsorship, and thanks

Are you looking to make a positive impact and get some awesome perks in the process? Join me on Patreon! For just $3 per month, you can join our Patreon community and help a creative mind in the Netherlands bring their ideas to life.

Not only will you get the satisfaction of supporting an individual's passions, but you'll also receive a 50% discount on any paid services that result from the projects you sponsor. Plus, as a Patreon member, you'll have exclusive voting rights on new features and the opportunity to shape the direction of future projects. Don't miss out on this chance to make a difference and get some amazing benefits in return.

Special thanks to Mahdi Chaker for the heavy training GPUs for training this model, LEAP and ControlInstructPix2Pix + Running the bot on my Discord server.
And of course my patron(s):
Benjamin