modrill/qwen3-4b-nothink-baseline-full-sft

Full-parameter supervised fine-tuning (SFT) of Qwen/Qwen3-4B-Base on the nothink_mix dataset. This checkpoint is a full SFT model: weights are stored in a single model.safetensors file (~8.2 GB). This is not a LoRA adapter.

Related models

Training summary

Field Value
Method Full SFT (ZeRO-2)
Dataset nothink_mix
Chat template qwen3_nothink
Epochs 2
Seed 42
Cutoff length 8192
Packing false
Per-device batch size 4
Gradient accumulation 4
Effective batch size 64
Learning rate 2e-5
Train loss 0.1898
Train steps 2988
Finished 2026-06-08 07:10 CST

Usage

Load with Hugging Face Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "modrill/qwen3-4b-nothink-baseline-full-sft"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

Training hyperparameters

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU (4 devices)
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: AdamW (fused)
  • lr_scheduler: cosine
  • lr_scheduler_warmup_steps: 0.03
  • num_epochs: 2.0

Framework versions

  • Transformers 5.6.0
  • PyTorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
28
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for modrill/qwen3-4b-nothink-baseline-full-sft

Finetuned
(323)
this model