SmolLM2-1.7B โ€” Scheduled QAT (Linear Schedule)

This model was produced by Scheduled Quantization-Aware Training with a linear precision reduction schedule, targeting INT4 deployment on edge devices (Android, iOS, Raspberry Pi).

Important

This model is in bfloat16 โ€” it is NOT quantized. QAT trains weights to be robust to quantization noise, but the actual quantization happens at export time. For the quantized GGUF versions ready for deployment, see:

jpcurada/SmolLM2-1.7B-Scheduled-QAT-Linear-GGUF

Training Details

Parameter Value
Base model HuggingFaceTB/SmolLM2-1.7B
Method Scheduled QAT (Linear bit-width reduction)
Training data WikiText-103 (4000 sequences ร— 512 tokens)
Hardware Kaggle TPU v5e-8 (8 cores)
Epochs 1
Effective batch size 64 (4 per-core ร— 2 grad accum ร— 8 cores)
Learning rate 2e-5 (cosine decay)
Optimizer AdamW (weight_decay=0.01)
Training precision bfloat16
Training time ~1150 seconds

Bit-Width Schedule

Phase Epoch Range Bit-width
Warmup 0.0 โ†’ 0.1 FP32 (no quantization noise)
Linear reduction 0.1 โ†’ 0.9 32 โ†’ 16 โ†’ 8 โ†’ 4 (gradual)
Stabilization 0.9 โ†’ 1.0 INT4 (final fine-tuning)

Results (WikiText-103 Test)

Metric Value
Test loss 3.0392
Test perplexity 20.89

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "jpcurada/SmolLM2-1.7B-Scheduled-QAT-Linear-INT4",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("jpcurada/SmolLM2-1.7B-Scheduled-QAT-Linear-INT4")

inputs = tokenizer("The future of AI is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Files

File Description
model.safetensors Model weights (bfloat16)
config.json Model architecture config
tokenizer.json Tokenizer
results.json Training results (loss, perplexity)
training_log.json Step-by-step training log

Related

Citation

This model is part of a thesis on Scheduled Quantization-Aware Training for Small Language Models targeting edge deployment.

License

Apache 2.0 (same as base model)

Downloads last month
10
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jpcurada/SmolLM2-1.7B-Scheduled-QAT-Linear-INT4

Finetuned
(58)
this model

Dataset used to train jpcurada/SmolLM2-1.7B-Scheduled-QAT-Linear-INT4