KindleHare — Qwen2.5-7B LoRA

QLoRA fine-tune of Qwen/Qwen2.5-7B-Instruct on agentic reasoning traces.

Model Details

  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Fine-tuning method: QLoRA (4-bit NF4 + LoRA r=16)
  • Developed by: AlihanSDev
  • License: apache-2.0

Training Data

Dataset: lambda/hermes-agent-reasoning-traces (kimi split)

  • 7646 samples of agentic coding and reasoning traces
  • Categories: Terminal Tasks, coding, reasoning
  • Filtered: tool/system messages removed, only user↔assistant turns kept

Training Details

Parameter Value
Hardware 1x NVIDIA T4 (Kaggle)
Epochs 1
Learning rate 1e-4
Batch size 1 (grad acc 8)
LoRA rank 16
LoRA alpha 16
Target modules q/k/v/o/gate/up/down proj
Precision fp16
Optimizer paged_adamw_8bit
Max sequence length 1024
Training time ~8 hours

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "AlihanSDev/kindlehare-qwen7b-lora")
tokenizer = AutoTokenizer.from_pretrained("AlihanSDev/kindlehare-qwen7b-lora")

messages = [{"role": "user", "content": "Write a Python function to reverse a string."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

  • Trained for 1 epoch only
  • Eval during training was disabled due to T4 memory constraints
  • Loss was noisy due to diverse complexity of agentic traces in dataset
  • Not evaluated on standard benchmarks (MMLU, HumanEval)

Framework versions

  • PEFT 0.12.0
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlihanSDev/kindlehare-qwen7b-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(2094)
this model

Dataset used to train AlihanSDev/kindlehare-qwen7b-lora