πŸ€– Tiny Chatbot β€” LoRA Fine-Tuned on Alpaca

A conversational assistant produced by fine-tuning TinyLlama-1.1B-Chat-v1.0 on the tatsu-lab/alpaca instruction dataset (52 K English instruction–response pairs) using LoRA (rank 16) via TRL's SFTTrainer on a Kaggle Dual T4 GPU environment.


πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Havoc999/tiny-chatbot",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Havoc999/tiny-chatbot")

prompt = (
    "Below is an instruction that describes a task. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n"
    "Explain the water cycle in simple terms.\n\n"
    "### Response:\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.15,
)
response = tokenizer.decode(output[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Multi-turn (Chat Template)

from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

messages = [
    {"role": "user", "content": "What is photosynthesis?"},
]

# TinyLlama-Chat supports the built-in chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(pipe(prompt, max_new_tokens=200)[0]["generated_text"])

πŸ“Š Benchmark Results

All benchmarks were evaluated after fine-tuning, using greedy decoding unless otherwise noted.

MMLU β€” Elementary Mathematics

Metric Value
Samples evaluated 50
Correct 15
Invalid outputs 4
Accuracy 30.00%
Random baseline (4-way) 25.00%

+5 pp above random. The model demonstrates marginal elementary math ability consistent with the small 1.1 B parameter count and an English instruction dataset that contains limited mathematical content.


HellaSwag (commonsense NLI)

Metric Score Samples
Accuracy 0.4550 200
Accuracy (normalised) 0.5600 200

Normalised accuracy above 0.50 indicates better-than-random commonsense sentence completion. HellaSwag is a strong proxy for general language understanding.


PIQA (physical intuition QA)

Metric Score Samples
Accuracy 0.7450 200
Accuracy (normalised) 0.7400 200

PIQA tests physical intuition and everyday procedural knowledge. 0.74 is a solid result for a 1.1 B model, suggesting the base pre-training retains good world knowledge even after instruction fine-tuning.


ARC Challenge (grade-school science)

Metric Score Samples
Accuracy 0.3050 200
Accuracy (normalised) 0.3500 200

ARC-Challenge targets questions that require reasoning beyond simple retrieval. 0.35 normalised reflects the model's limitations on multi-step reasoning at this scale.


Summary

Benchmark Metric Score
MMLU Elem. Math Accuracy 30.00%
HellaSwag Acc (norm) 56.00%
PIQA Acc (norm) 74.00%
ARC Challenge Acc (norm) 35.00%

πŸ“‹ Training Details

Setting Value
Base model TinyLlama/TinyLlama-1.1B-Chat-v1.0
Dataset tatsu-lab/alpaca
Train split 45,000 examples
Eval split 2,000 examples
Fine-tuning method LoRA (PEFT)
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters 17 M / 1.1 B (1.55%)
Precision float16 (AMP)
Epochs 3
Per-GPU batch size 4
Gradient accumulation 4 steps
Effective global batch 32 (4 Γ— 2 GPUs Γ— 4 accum)
Peak learning rate 2e-4
LR scheduler Cosine annealing
Warmup ratio 3%
Gradient checkpointing Enabled
NEFTune noise alpha 5
Hardware Kaggle Dual T4 (2 Γ— 16 GiB VRAM)
Loss masking Completion-only (response tokens only)
Early stopping patience 3 evaluations

βš™οΈ Reproduce

# Install dependencies
# pip install transformers datasets peft trl accelerate bitsandbytes huggingface_hub

from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from datasets import load_dataset

# 1. Load dataset
dataset = load_dataset("tatsu-lab/alpaca", split="train")

# 2. Format examples
def format_alpaca(ex):
    input_section = f"### Input:\n{ex['input']}\n\n" if ex["input"].strip() else ""
    return {
        "text": (
            "Below is an instruction that describes a task. "
            "Write a response that appropriately completes the request.\n\n"
            f"### Instruction:\n{ex['instruction']}\n\n"
            f"{input_section}"
            f"### Response:\n{ex['output']}"
        )
    }

dataset = dataset.map(format_alpaca, batched=False)

# 3. Load model + LoRA
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    torch_dtype="auto",
    device_map={"": 0},
)
model.config.use_cache = False
model.enable_input_require_grads()

lora_config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    bias="none", task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
)
model = get_peft_model(model, lora_config)

# 4. Train
trainer = SFTTrainer(
    model=model, tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    data_collator=DataCollatorForCompletionOnlyLM("### Response:\n", tokenizer=tokenizer),
    args=TrainingArguments(
        output_dir="./chatbot-lora",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        fp16=True,
        gradient_checkpointing=True,
        save_strategy="steps", save_steps=200, save_total_limit=3,
        eval_strategy="no",
    ),
)
trainer.train()

⚠️ Limitations

  • English only β€” the base model and Alpaca dataset are English-focused; other languages may produce incoherent outputs.
  • Hallucination β€” like all generative models, this one can confidently state incorrect facts. Always verify important claims.
  • Limited reasoning β€” at 1.1 B parameters, multi-step logical and mathematical reasoning is unreliable (see ARC / MMLU results above).
  • No RLHF safety alignment β€” this model has not undergone reinforcement learning from human feedback. It inherits TinyLlama's base alignment only and may produce inappropriate responses to adversarial prompts.
  • Short context β€” trained with a maximum sequence length of 512 tokens; very long conversations will be truncated.
  • Not production-ready β€” intended as a learning artefact and research baseline, not a deployed consumer product.

πŸ“œ License

This model is released under the Apache 2.0 license, consistent with the TinyLlama base model and the Alpaca dataset. See LICENSE for full terms.


Fine-tuned on Kaggle Dual T4 GPU Β· TRL SFTTrainer Β· LoRA via PEFT

Downloads last month
132
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Havoc999/tiny-chatbot

Adapter
(1520)
this model

Dataset used to train Havoc999/tiny-chatbot

Space using Havoc999/tiny-chatbot 1