๐ธ TinyLlama-1.1B-Roleplay-Custom-Kernels ๐ฅ
This repository contains a fine-tuned TinyLlama 1.1B model conditioned on various anime character archetypes (Tsundere, Yandere, Genki, etc.).
However, this is not a standard Hugging Face LoRA.
This model was fine-tuned using a handwritten, baremetal LoRA implementation built entirely from scratch to avoid any external library use. Sounds very techy but its actually very simple if you strip it down to the actual code. I could have used peft but it was kinda fun doing it this way.
๐ง Architecture & Training
- Base Model:
TinyLlama/TinyLlama-1.1B-Chat-v1.0 - Technique: Bare-Metal LoRA Injection (Rank=8, Alpha=16.0)
- Targets:
q_proj,v_proj - Dataset: 744 rows of custom conversational data(
https://huggingface.co/datasets/maomao88/anime-waifu-personality-chat-with-questions). - Hardware Limit: Trained entirely locally on a single NVIDIA GTX 1650 Ti (4GB VRAM) using custom 32-bit gradient scaling to prevent
NaNfloat16 overflow.
โ ๏ธ How to Run (READ THIS)
Because this model does not use the PEFT library, you cannot load it using AutoPeftModel. You must download the provided surgery scripts in this repository and manually wire the bypasses before loading the .pt weights.
1. Download the Surgery Tools
Ensure you have custom_lora.py and inject.py in your working directory alongside the weights.
2. The Injection Script
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from inject import inject_lora # Your downloaded surgery script
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# ๐ธ1. Load the frozen base model
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
trust_remote_code=True,
device_map="cuda"
)
# ๐ธ2. Wire the custom LoRA bypasses (Must match training Rank/Alpha!)
targets = ["q_proj", "v_proj"]
inject_lora(model, targets, rank=8, alpha=16.0)
# ๐ธ3. Load the custom weights (strict=False is mandatory!)
model.load_state_dict(torch.load("waifu_lora_weights.pt"), strict=False)
model.eval()
print("Custom LoRA injected successfully.")
Model tree for levanell/TinyLlama-1.1B-Roleplay-Custom-Kernels
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0