SmolStruct-1.7B

A small, fully-open model that turns free text into valid JSON and selects function/tool calls — built to run locally so sensitive data never leaves the device.

SmolStruct is a LoRA fine-tune of SmolLM2-1.7B-Instruct specialised for structured output. In the agent era, the bottleneck is rarely raw model size — it is whether a model returns reliably parseable JSON. SmolStruct targets exactly that, in a footprint small enough for a laptop, an edge device, or a privacy-constrained environment (e.g. data that cannot be sent to a third-party API for regulatory reasons).

Core idea: a 1.7B model + grammar-constrained decoding produces JSON that is valid 100% of the time — locally, cheaply, privately.

What it does

Three capabilities, one unified chat format:

Schema-guided extraction — given a JSON schema and some text, fill the schema.
Function / tool calling — given available tools and a request, emit the correct tool call.
Free JSON extraction — pull structured fields out of unstructured text.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch, json

# This repo is a LoRA adapter — load the base model, then apply the adapter.
base_id = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
adapter_id = "tugrulkaya/smolstruct-1.7b"
tok = AutoTokenizer.from_pretrained(adapter_id)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)

schema = '{"name": "string", "age": "integer", "city": "string", "email": "string"}'
text = "Tuğrul Kaya is 31 and lives in Konya. Reach him at tugrul.kaya@example.org."

messages = [
    {"role": "system", "content": "You are a precise extraction assistant. Read the text and return ONLY a JSON object that matches the schema. Use null for fields not present."},
    {"role": "user", "content": f"Schema:\n{schema}\n\nText:\n{text}"},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# {"name": "Tuğrul Kaya", "age": 31, "city": "Konya", "email": "tugrul.kaya@example.org"}

Guaranteed-valid JSON (recommended)

Pair the model with grammar-constrained decoding so the output cannot be invalid:

import outlines
om = outlines.from_transformers(model, tok)
generator = outlines.Generator(om, outlines.types.JsonSchema('{"type": "object"}'))
print(generator(prompt, max_new_tokens=256))   # always parses

Why this exists (and why small + local matters)

Most "structured output" tutorials reach for a large hosted model. But:

Cost — extraction is a high-volume, repetitive task; paying per token for a frontier model is wasteful.
Privacy / compliance — in many settings (healthcare, public sector, finance, and under regimes like Türkiye's KVKK or the EU's GDPR) sending raw records to a third-party API is a non-starter. A 1.7B model runs on-prem or on-device.
Reliability — with grammar constraints, a small model gives a hard guarantee (valid JSON), which a large unconstrained model does not.

SmolStruct is a deliberate bet that, for structured output, the right architecture beats raw scale.

Training


Base	`HuggingFaceTB/SmolLM2-1.7B-Instruct`
Method	LoRA (r=16, α=32, dropout=0.05) on attention + MLP projections
Data	~6k synthetic ChatML examples (extraction + tool calling), incl. EN & TR
Format	ChatML; assistant target is always a single JSON object
Objective	Supervised fine-tuning (TRL `SFTTrainer`)

Training data is synthetically generated by a deterministic, template-based generator (no third-party API, fully reproducible from a seed). The generator, training script, evaluator and demo are all open — see the project repository.

Evaluation

Evaluated on a held-out synthetic validation split with four metrics:

JSON validity rate — does the output parse?
Schema compliance — exact key-set match (no missing, no extra keys)
Field-level accuracy — value match against ground truth
Tool-selection accuracy — correct tool chosen (tool-call examples)

Measured on a held-out synthetic validation split (200 examples, in-distribution, greedy / free decoding, bf16 on an Apple M4 Pro). With grammar-constrained decoding JSON validity is 100% by construction; on this in-distribution set free decoding already reaches it, so the axes that actually move are field accuracy and tool selection.

Metric	Base SmolLM2-1.7B-Instruct	SmolStruct (LoRA, 3 epochs)	Constrained
JSON validity	100.0%	100.0%	100% (by construction)
Schema compliance	100.0%	100.0%	100%
Field-level accuracy	88.6%	99.5%	≈ free
Tool selection	92.1%	100.0% (89 ex.)	≈ free

The base instruct model already emits well-formed, schema-correct JSON on this simple templated format — so the LoRA's lift is exactly where extraction quality lives: field-value accuracy (+10.9 pts) and tool selection (+7.9 pts). Constrained decoding then turns the 100% validity from observed into guaranteed. These numbers are in-distribution (the val split mirrors the generator); out-of-distribution inputs will score lower (see Limitations). Training: 3 epochs, final train_loss 0.079, token accuracy 95.9%.

Limitations

Synthetic-data domain. Schemas/tools mirror the generator (person, invoice, appointment, contact, product; 5 tools). Out-of-distribution schemas may need a few in-context examples or a small additional fine-tune.
Not a reasoning model. It extracts and routes; it does not perform multi-step reasoning or arithmetic beyond what's stated.
Small-model ceiling. On long, noisy, or adversarial inputs a 1.7B model will trail a frontier model on field accuracy; constrained decoding fixes validity, not comprehension.
Language coverage. Trained mostly on English with some Turkish; other languages untested.

Intended use & out-of-scope

Intended: local/edge structured extraction, function-call routing for agents, privacy-sensitive pipelines, cost-sensitive high-volume extraction.

Out of scope: safety-critical decisions without human review; tasks requiring factual recall or reasoning the model was not trained for; languages outside EN/TR without evaluation.

Citation

@misc{smolstruct2026,
  title  = {SmolStruct-1.7B: Local, Private Structured Output with a Small Language Model},
  author = {Kaya, Mehmet Tuğrul},
  year   = {2026},
  howpublished = {Hugging Face},
}

Built on SmolLM2 by Hugging Face. Fine-tuning, data generation, evaluation and demo by Mehmet Tuğrul Kaya (@mtugrull).

Downloads last month: 6

Model tree for tugrulkaya/smolstruct-1.7b

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Adapter

(41)

this model

tugrulkaya
/

smolstruct-1.7b