๐ŸŒธ TinyLlama-1.1B-Roleplay-Custom-Kernels ๐Ÿฅ

This repository contains a fine-tuned TinyLlama 1.1B model conditioned on various anime character archetypes (Tsundere, Yandere, Genki, etc.).

However, this is not a standard Hugging Face LoRA.

This model was fine-tuned using a handwritten, baremetal LoRA implementation built entirely from scratch to avoid any external library use. Sounds very techy but its actually very simple if you strip it down to the actual code. I could have used peft but it was kinda fun doing it this way.

๐Ÿง  Architecture & Training

  • Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  • Technique: Bare-Metal LoRA Injection (Rank=8, Alpha=16.0)
  • Targets: q_proj, v_proj
  • Dataset: 744 rows of custom conversational data(https://huggingface.co/datasets/maomao88/anime-waifu-personality-chat-with-questions).
  • Hardware Limit: Trained entirely locally on a single NVIDIA GTX 1650 Ti (4GB VRAM) using custom 32-bit gradient scaling to prevent NaN float16 overflow.

โš ๏ธ How to Run (READ THIS)

Because this model does not use the PEFT library, you cannot load it using AutoPeftModel. You must download the provided surgery scripts in this repository and manually wire the bypasses before loading the .pt weights.

1. Download the Surgery Tools

Ensure you have custom_lora.py and inject.py in your working directory alongside the weights.

2. The Injection Script

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from inject import inject_lora # Your downloaded surgery script

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# ๐ŸŒธ1. Load the frozen base model
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    trust_remote_code=True, 
    device_map="cuda"
)

# ๐ŸŒธ2. Wire the custom LoRA bypasses (Must match training Rank/Alpha!)
targets = ["q_proj", "v_proj"]
inject_lora(model, targets, rank=8, alpha=16.0)

# ๐ŸŒธ3. Load the custom weights (strict=False is mandatory!)
model.load_state_dict(torch.load("waifu_lora_weights.pt"), strict=False)
model.eval()
print("Custom LoRA injected successfully.")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for levanell/TinyLlama-1.1B-Roleplay-Custom-Kernels

Finetuned
(555)
this model