🌸 TinyLlama-1.1B-Roleplay-Custom-Kernels 🍥

This repository contains a fine-tuned TinyLlama 1.1B model conditioned on various anime character archetypes (Tsundere, Yandere, Genki, etc.).

However, this is not a standard Hugging Face LoRA.

This model was fine-tuned using a handwritten, baremetal LoRA implementation built entirely from scratch to avoid any external library use. Sounds very techy but its actually very simple if you strip it down to the actual code. I could have used peft but it was kinda fun doing it this way.

🧠 Architecture & Training

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Technique: Bare-Metal LoRA Injection (Rank=8, Alpha=16.0)
Targets: q_proj, v_proj
Dataset: 744 rows of custom conversational data(https://huggingface.co/datasets/maomao88/anime-waifu-personality-chat-with-questions).
Hardware Limit: Trained entirely locally on a single NVIDIA GTX 1650 Ti (4GB VRAM) using custom 32-bit gradient scaling to prevent NaN float16 overflow.

⚠️ How to Run (READ THIS)

Because this model does not use the PEFT library, you cannot load it using AutoPeftModel. You must download the provided surgery scripts in this repository and manually wire the bypasses before loading the .pt weights.

1. Download the Surgery Tools

Ensure you have custom_lora.py and inject.py in your working directory alongside the weights.

2. The Injection Script

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from inject import inject_lora # Your downloaded surgery script

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# 🌸1. Load the frozen base model
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    trust_remote_code=True, 
    device_map="cuda"
)

# 🌸2. Wire the custom LoRA bypasses (Must match training Rank/Alpha!)
targets = ["q_proj", "v_proj"]
inject_lora(model, targets, rank=8, alpha=16.0)

# 🌸3. Load the custom weights (strict=False is mandatory!)
model.load_state_dict(torch.load("waifu_lora_weights.pt"), strict=False)
model.eval()
print("Custom LoRA injected successfully.")

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for levanell/TinyLlama-1.1B-Roleplay-Custom-Kernels

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Finetuned

(555)

this model