Instructions to use curious-techie/PersonaLM-Ayanokoji-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use curious-techie/PersonaLM-Ayanokoji-8B with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use curious-techie/PersonaLM-Ayanokoji-8B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for curious-techie/PersonaLM-Ayanokoji-8B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for curious-techie/PersonaLM-Ayanokoji-8B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for curious-techie/PersonaLM-Ayanokoji-8B to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="curious-techie/PersonaLM-Ayanokoji-8B", max_seq_length=2048, )
- PersonaLM-Ayanokoji-8B
PersonaLM-Ayanokoji-8B
PersonaLM-Ayanokoji-8B is a PEFT LoRA persona model built on top of unsloth/Llama-3.1-8B-Instruct. It is trained to produce calm, analytical, emotionally restrained responses inspired by Kiyotaka Ayanokoji from Classroom of the Elite.
This repository does not contain a full merged 8B model. It contains LoRA adapters, datasets, training scripts, inference code, and evaluation outputs.
The training pipeline has two stages:
- SFT adapter: supervised fine-tuning on persona-style chat examples.
- DPO adapter: preference tuning that favors restrained, analytical answers over warm or generic assistant responses.
For the final DPO model, load order matters:
base model -> apply SFT LoRA -> merge SFT -> apply DPO LoRA
What It Is Good For
- Ayanokoji-style roleplay and dialogue
- Persona alignment experiments
- Studying SFT + DPO behavior on a small custom persona dataset
- Building a lightweight character chatbot with LoRA adapters
- Comparing base, SFT, and DPO behavior using the included evaluation scripts
What It Is Not
- It is not an official Classroom of the Elite model.
- It is not a general-purpose safety-aligned assistant.
- It is not a full model checkpoint; you need the base Llama 3.1 8B Instruct model.
- It should not be treated as factual, emotional, medical, legal, or financial advice.
Repository Contents
| Path | Description |
|---|---|
sft_lora/ |
Stage 1 SFT LoRA adapter. Rank 16, alpha 32. |
dpo_lora/ |
Stage 2 DPO LoRA adapter. Rank 8, alpha 16. |
datasets/ayanokoji_finetune.jsonl |
SFT chat dataset, 999 examples. |
datasets/ayanokoji_dpo.jsonl |
DPO preference dataset, 199 chosen/rejected pairs. |
scripts/train_sft.py |
Trains the SFT adapter with Unsloth + TRL. |
scripts/train_dpo.py |
Trains the DPO adapter on top of merged SFT weights. |
scripts/infer_ayanokoji.py |
Local interactive or one-shot inference script. |
scripts/measure_deviation.py |
Evaluation script for reward margin, style score, and perplexity. |
evaluation/ |
Saved base, SFT, and DPO evaluation JSON files. |
Installation
Use a CUDA environment with PyTorch, Unsloth, TRL, PEFT, Transformers, and Datasets installed.
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install transformers datasets accelerate peft trl bitsandbytes safetensors
Depending on your CUDA/PyTorch setup, you may need to follow the current Unsloth installation instructions for your platform.
Quick Inference from the Hub
The following example loads the base model, applies the SFT adapter, merges it, then applies the DPO adapter.
import torch
import unsloth # Import before other Unsloth-related modules.
from peft import PeftModel
from unsloth import FastLanguageModel
BASE_MODEL = "unsloth/Llama-3.1-8B-Instruct"
REPO_ID = "curious-techie/PersonaLM-Ayanokoji-8B"
MAX_SEQ_LENGTH = 2048
system_prompt = (
"You are Kiyotaka Ayanokoji from Classroom of the Elite. "
"Respond in his style: cold, analytical, philosophical, emotionally restrained, "
"and focused on logical observation rather than comfort or praise."
)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=BASE_MODEL,
max_seq_length=MAX_SEQ_LENGTH,
dtype=None,
load_in_4bit=True,
)
# Stage 1: load and merge SFT.
model = PeftModel.from_pretrained(model, REPO_ID, subfolder="sft_lora")
model = model.merge_and_unload()
# Stage 2: load final DPO adapter.
model = PeftModel.from_pretrained(model, REPO_ID, subfolder="dpo_lora")
FastLanguageModel.for_inference(model)
model.eval()
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "How do you deal with people who underestimate you?"},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.pad_token_id,
)
new_tokens = output[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True).strip())
SFT-Only Inference
If you only want to test the supervised fine-tuned adapter:
from peft import PeftModel
from unsloth import FastLanguageModel
import unsloth # noqa: F401
BASE_MODEL = "unsloth/Llama-3.1-8B-Instruct"
REPO_ID = "curious-techie/PersonaLM-Ayanokoji-8B"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=BASE_MODEL,
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
model = PeftModel.from_pretrained(model, REPO_ID, subfolder="sft_lora")
FastLanguageModel.for_inference(model)
Running the Included Scripts
Clone the repository first:
git lfs install
git clone https://huggingface.co/curious-techie/PersonaLM-Ayanokoji-8B
cd PersonaLM-Ayanokoji-8B
The scripts were written for a local training layout that expects:
data/ayanokoji_finetune.jsonl
data/ayanokoji_dpo.jsonl
outputs/sft-ayanokoji
outputs/dpo-ayanokoji
Because this repository stores the public datasets under datasets/, either copy them into data/ or update the DATA_PATH constants in the scripts:
mkdir -p data
cp datasets/ayanokoji_finetune.jsonl data/ayanokoji_finetune.jsonl
cp datasets/ayanokoji_dpo.jsonl data/ayanokoji_dpo.jsonl
Train SFT
python scripts/train_sft.py
This writes the SFT adapter to:
outputs/sft-ayanokoji
Train DPO
Run DPO after SFT:
python scripts/train_dpo.py
This loads the base model, applies outputs/sft-ayanokoji, merges the SFT weights, then trains a new DPO LoRA adapter. The result is written to:
outputs/dpo-ayanokoji
Chat Locally
python scripts/infer_ayanokoji.py
One-shot prompt:
python scripts/infer_ayanokoji.py --stage dpo --prompt "How should I handle betrayal?"
SFT-only:
python scripts/infer_ayanokoji.py --stage sft
By default, the script uses the local adapter paths under outputs/. If you want the script to load directly from this Hub repository, update SFT_ADAPTER_DIR and DPO_ADAPTER_DIR in scripts/infer_ayanokoji.py to use the Hub repo with PEFT subfolder= loading.
Dataset Format
SFT Dataset
datasets/ayanokoji_finetune.jsonl contains chat-style records:
{
"messages": [
{
"role": "system",
"content": "You are Kiyotaka Ayanokoji from Classroom of the Elite..."
},
{
"role": "user",
"content": "How do I deal with failure?"
},
{
"role": "assistant",
"content": "Failure is data. Nothing more..."
}
]
}
DPO Dataset
datasets/ayanokoji_dpo.jsonl contains preference pairs:
{
"prompt": "How do I deal with failure?",
"chosen": "Failure is data. Nothing more...",
"rejected": "Failure is never the end - it's just a stepping stone to success!..."
}
The chosen responses represent the colder, more analytical target persona. The rejected responses are intentionally warmer, more motivational, or more generic.
Training Configuration
Both adapters target the main Llama attention and MLP projection layers:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
SFT adapter:
| Setting | Value |
|---|---|
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Dropout | 0.05 |
| Epochs | 3 |
| Learning rate | 2e-4 |
| Max sequence length | 2048 |
DPO adapter:
| Setting | Value |
|---|---|
| LoRA rank | 8 |
| LoRA alpha | 16 |
| Dropout | 0.05 |
| Epochs | 2 |
| Learning rate | 5e-5 |
| DPO beta | 0.1 |
| Max sequence length | 2048 |
Evaluation
The repository includes evaluation outputs for 50 samples across base, SFT, and DPO checkpoints.
| Stage | Mean reward margin | Chosen preferred | Style score | Perplexity |
|---|---|---|---|---|
| Base | -0.7013 | 2% | 0.3309 | 52.13 |
| SFT | 0.1161 | 70% | 0.2617 | 1.79 |
| DPO | 0.6352 | 92% | 0.2481 | 64.92 |
Interpretation:
- Reward margin improved strongly after SFT and DPO, meaning the trained model assigns higher likelihood to preferred persona responses over generic rejected responses.
- DPO increased chosen-response preference from 70% to 92% on the sampled preference set.
- The style classifier is a lightweight zero-shot heuristic, not a definitive benchmark.
- Perplexity should be interpreted carefully because adapter load order and evaluation setup can affect the value, especially for the DPO stage.
To rerun evaluation locally:
python scripts/measure_deviation.py --stage base --model unsloth/Llama-3.1-8B-Instruct
python scripts/measure_deviation.py --stage sft --model outputs/sft-ayanokoji
python scripts/measure_deviation.py --stage dpo --model outputs/dpo-ayanokoji
Example Interaction
User
How do you deal with people who underestimate you?
PersonaLM-Ayanokoji-8B
Being underestimated is rarely a disadvantage. People reveal more when they believe you are harmless. In most situations, obscurity is more useful than recognition.
Suggested Generation Settings
For stable persona output:
temperature: 0.6-0.8
top_p: 0.85-0.95
max_new_tokens: 128-512
Lower temperature produces more controlled, concise answers. Higher temperature may make the persona more expressive but less consistent.
Web App
A companion chat interface is available here:
PersonaLM-Ayanokoji-8B Web App
Limitations
- The model may overapply the persona style even when warmth or direct help would be more appropriate.
- The model can still hallucinate facts, fictional details, or advice.
- The DPO adapter depends on the SFT-merged base; loading only the DPO adapter on the original base may produce different behavior.
- The dataset is small and focused on one persona style, so generalization is limited.
- The persona is inspired by a fictional character and should be used for entertainment, research, and experimentation.
License and Attribution
No standalone license is specified for this repository at the time of writing. Use of the base model is subject to the Llama 3.1 license and any terms attached to unsloth/Llama-3.1-8B-Instruct.
This is a fan-made educational and experimental project. It is not affiliated with, endorsed by, or associated with:
- Classroom of the Elite
- Kadokawa
- Studio Lerche
- the original light novel authors
- anime publishers or rights holders
All rights to original characters, names, and source material belong to their respective owners.
Citation
If you use this repository in an experiment or derivative project, cite the Hugging Face repository:
@misc{personalm_ayanokoji_8b,
title = {PersonaLM-Ayanokoji-8B},
author = {curious-techie},
year = {2026},
url = {https://huggingface.co/curious-techie/PersonaLM-Ayanokoji-8B}
}
- Downloads last month
- -
Model tree for curious-techie/PersonaLM-Ayanokoji-8B
Base model
meta-llama/Llama-3.1-8B