Qwen3.5-4B โ€” LoRA adapter

A LoRA adapter fine-tuned from Qwen/Qwen3.5-4B for research experiments. Experimental artifact โ€” not evaluated for production use.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "Qwen/Qwen3.5-4B"
adapter_id = "qinjerem/qwen3.5-4b-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_id, dtype=torch.bfloat16, trust_remote_code=True, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

messages = [{"role": "user", "content": "Hello."}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=600, do_sample=True, temperature=1.0)
print(tokenizer.decode(out[0, inputs.input_ids.shape[1]:], skip_special_tokens=True))

Training

Base model Qwen/Qwen3.5-4B
Method LoRA (PEFT), rank 32, ฮฑ 64, use_rslora=True, dropout 0
Target modules attention + MLP + delta-net linear layers in the LLM backbone (vision tower frozen)
Trainable params 64.9 M / 4.27 B (1.52 %)
Epochs 6
Optimizer adamw_8bit, weight decay 0.01
LR / schedule 1e-5, linear, warmup 5 steps
Precision bf16
Effective batch 16 (4 ร— 4 grad-accum)
Max sequence 1024 tokens
Thinking mode at train disabled via chat template
Hardware 1ร— NVIDIA H100 80 GB

Framework versions

  • PEFT 0.19.1
  • Transformers 5.5.4
  • PyTorch 2.6.0+cu124
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for qinjerem/qwen3.5-4b-lora

Finetuned
Qwen/Qwen3.5-4B
Adapter
(256)
this model