Instructions to use Havoc999/tiny-chatbot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Havoc999/tiny-chatbot with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") model = PeftModel.from_pretrained(base_model, "Havoc999/tiny-chatbot") - Notebooks
- Google Colab
- Kaggle
π€ Tiny Chatbot β LoRA Fine-Tuned on Alpaca
A conversational assistant produced by fine-tuning TinyLlama-1.1B-Chat-v1.0 on the tatsu-lab/alpaca instruction dataset (52 K English instructionβresponse pairs) using LoRA (rank 16) via TRL's SFTTrainer on a Kaggle Dual T4 GPU environment.
π Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"Havoc999/tiny-chatbot",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Havoc999/tiny-chatbot")
prompt = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n"
"Explain the water cycle in simple terms.\n\n"
"### Response:\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.15,
)
response = tokenizer.decode(output[0, inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Multi-turn (Chat Template)
from transformers import pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
messages = [
{"role": "user", "content": "What is photosynthesis?"},
]
# TinyLlama-Chat supports the built-in chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(pipe(prompt, max_new_tokens=200)[0]["generated_text"])
π Benchmark Results
All benchmarks were evaluated after fine-tuning, using greedy decoding unless otherwise noted.
MMLU β Elementary Mathematics
| Metric | Value |
|---|---|
| Samples evaluated | 50 |
| Correct | 15 |
| Invalid outputs | 4 |
| Accuracy | 30.00% |
| Random baseline (4-way) | 25.00% |
+5 pp above random. The model demonstrates marginal elementary math ability consistent with the small 1.1 B parameter count and an English instruction dataset that contains limited mathematical content.
HellaSwag (commonsense NLI)
| Metric | Score | Samples |
|---|---|---|
| Accuracy | 0.4550 | 200 |
| Accuracy (normalised) | 0.5600 | 200 |
Normalised accuracy above 0.50 indicates better-than-random commonsense sentence completion. HellaSwag is a strong proxy for general language understanding.
PIQA (physical intuition QA)
| Metric | Score | Samples |
|---|---|---|
| Accuracy | 0.7450 | 200 |
| Accuracy (normalised) | 0.7400 | 200 |
PIQA tests physical intuition and everyday procedural knowledge. 0.74 is a solid result for a 1.1 B model, suggesting the base pre-training retains good world knowledge even after instruction fine-tuning.
ARC Challenge (grade-school science)
| Metric | Score | Samples |
|---|---|---|
| Accuracy | 0.3050 | 200 |
| Accuracy (normalised) | 0.3500 | 200 |
ARC-Challenge targets questions that require reasoning beyond simple retrieval. 0.35 normalised reflects the model's limitations on multi-step reasoning at this scale.
Summary
| Benchmark | Metric | Score |
|---|---|---|
| MMLU Elem. Math | Accuracy | 30.00% |
| HellaSwag | Acc (norm) | 56.00% |
| PIQA | Acc (norm) | 74.00% |
| ARC Challenge | Acc (norm) | 35.00% |
π Training Details
| Setting | Value |
|---|---|
| Base model | TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
| Dataset | tatsu-lab/alpaca |
| Train split | 45,000 examples |
| Eval split | 2,000 examples |
| Fine-tuning method | LoRA (PEFT) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable parameters | |
| Precision | float16 (AMP) |
| Epochs | 3 |
| Per-GPU batch size | 4 |
| Gradient accumulation | 4 steps |
| Effective global batch | 32 (4 Γ 2 GPUs Γ 4 accum) |
| Peak learning rate | 2e-4 |
| LR scheduler | Cosine annealing |
| Warmup ratio | 3% |
| Gradient checkpointing | Enabled |
| NEFTune noise alpha | 5 |
| Hardware | Kaggle Dual T4 (2 Γ 16 GiB VRAM) |
| Loss masking | Completion-only (response tokens only) |
| Early stopping patience | 3 evaluations |
βοΈ Reproduce
# Install dependencies
# pip install transformers datasets peft trl accelerate bitsandbytes huggingface_hub
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
from datasets import load_dataset
# 1. Load dataset
dataset = load_dataset("tatsu-lab/alpaca", split="train")
# 2. Format examples
def format_alpaca(ex):
input_section = f"### Input:\n{ex['input']}\n\n" if ex["input"].strip() else ""
return {
"text": (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
f"### Instruction:\n{ex['instruction']}\n\n"
f"{input_section}"
f"### Response:\n{ex['output']}"
)
}
dataset = dataset.map(format_alpaca, batched=False)
# 3. Load model + LoRA
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
torch_dtype="auto",
device_map={"": 0},
)
model.config.use_cache = False
model.enable_input_require_grads()
lora_config = LoraConfig(
r=16, lora_alpha=32, lora_dropout=0.05,
bias="none", task_type=TaskType.CAUSAL_LM,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
)
model = get_peft_model(model, lora_config)
# 4. Train
trainer = SFTTrainer(
model=model, tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=512,
data_collator=DataCollatorForCompletionOnlyLM("### Response:\n", tokenizer=tokenizer),
args=TrainingArguments(
output_dir="./chatbot-lora",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
fp16=True,
gradient_checkpointing=True,
save_strategy="steps", save_steps=200, save_total_limit=3,
eval_strategy="no",
),
)
trainer.train()
β οΈ Limitations
- English only β the base model and Alpaca dataset are English-focused; other languages may produce incoherent outputs.
- Hallucination β like all generative models, this one can confidently state incorrect facts. Always verify important claims.
- Limited reasoning β at 1.1 B parameters, multi-step logical and mathematical reasoning is unreliable (see ARC / MMLU results above).
- No RLHF safety alignment β this model has not undergone reinforcement learning from human feedback. It inherits TinyLlama's base alignment only and may produce inappropriate responses to adversarial prompts.
- Short context β trained with a maximum sequence length of 512 tokens; very long conversations will be truncated.
- Not production-ready β intended as a learning artefact and research baseline, not a deployed consumer product.
π License
This model is released under the Apache 2.0 license, consistent with the TinyLlama base model and the Alpaca dataset. See LICENSE for full terms.
Fine-tuned on Kaggle Dual T4 GPU Β· TRL SFTTrainer Β· LoRA via PEFT
- Downloads last month
- 132
Model tree for Havoc999/tiny-chatbot
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0