Llama-3.2-3B JSON Extraction

A LoRA finetune of Llama 3.2 3B Instruct that extracts structured data from unstructured documents and returns a single, clean JSON object conforming to a provided JSON schema. The model outputs only the JSON object — no markdown code fences, no preamble, and no trailing commentary — making its output directly parseable with json.loads().

This was trained with Unsloth for fast, memory-efficient QLoRA training on a single Kaggle T4 GPU.

Intended use

Given (1) a JSON schema describing the target structure and (2) a free-text / markdown document, the model returns a JSON object populated from the document. Useful for document parsing tasks such as invoices, medical records, business documents, and similar structured-extraction problems.

Out of scope: very long documents (training filtered examples exceeding the 2,048-token context), inputs from raw OCR (training data was clean text, not OCR output), and schemas substantially different from those seen in the training distribution.

Prompt format

The model was trained with the following system prompt and user-message layout. Use the same format at inference — the model is sensitive to it.

System prompt:

You are a data extraction assistant. Extract information from the document and
return a single JSON object that conforms to the provided JSON schema. Output
ONLY the JSON object — no explanations and no markdown code fences.

User message:

JSON schema:
{schema}

Document:
{document}

Extract the data as JSON.

How to use

Option A — load the LoRA adapters (this repo)

from unsloth import FastLanguageModel
from transformers import TextStreamer

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "Legeng/llama3.2-3b-json-extraction",
    max_seq_length = 2048,
    dtype          = None,
    load_in_4bit   = True,
)
FastLanguageModel.for_inference(model)

system_prompt = (
    "You are a data extraction assistant. Extract information from the document "
    "and return a single JSON object that conforms to the provided JSON schema. "
    "Output ONLY the JSON object — no explanations and no markdown code fences."
)

schema   = '{"type":"object","properties":{"invoice_number":{"type":"string"}}}'
document = "Invoice #INV-2025-0042 ..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user",   "content": f"JSON schema:\n{schema}\n\nDocument:\n{document}\n\nExtract the data as JSON."},
]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")

out = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Note: set max_new_tokens high enough (e.g. 2048) for documents with large nested schemas, or long outputs will be truncated and fail to parse.

Option B — merged standalone model

A merged 16-bit version is available at Legeng/llama3.2-3b-json-extraction-merged, which loads as a normal model without needing the base model separately.

Training data

Dataset: paraloq/json_data_extraction (Apache-2.0)
Total examples: 484 (single train split)
Split: 90% train / 10% eval, seed=3407, via train_test_split(test_size=0.1)
Filtering: examples whose tokenized length exceeded max_seq_length (2,048) were dropped to avoid truncating target JSON
Each example pairs an unstructured document (text) and a JSON schema (schema) with the ground-truth JSON object (item)

Training procedure

Trained with Unsloth + TRL SFTTrainer using QLoRA (4-bit base model + LoRA adapters).

Base model & quantization

Parameter	Value
`base_model`	`unsloth/Llama-3.2-3B-Instruct`
`max_seq_length`	2048
`dtype`	None (auto → float16 on T4)
`load_in_4bit`	True (QLoRA)

LoRA configuration

Parameter	Value
`r` (rank)	16
`lora_alpha`	16
`lora_dropout`	0
`bias`	"none"
`target_modules`	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
`use_gradient_checkpointing`	"unsloth"
`random_state`	3407

Training hyperparameters

Parameter	Value
`per_device_train_batch_size`	2
`gradient_accumulation_steps`	4
effective batch size	8
`num_train_epochs`	3
`learning_rate`	2e-4
`optim`	adamw_8bit
`weight_decay`	0.01
`warmup_steps`	5
`lr_scheduler_type`	linear
`eval_strategy`	steps
`eval_steps`	20
`logging_steps`	1
`dataset_num_proc`	1
`seed`	3407
total optimizer steps	153

Hardware

Single NVIDIA T4 (16 GB), Kaggle
Training time: ~45 minutes

Training results

Training and validation loss over the run:

Step	Training Loss	Validation Loss
20	0.0778	0.1001
40	0.0514	0.0945
60	0.0236	0.0923
80	0.0380	0.0950
100	0.1236	0.0945
120	0.0048	0.1004
140	0.0492	0.1057
153	0.0086	0.1036

Validation loss was lowest around step 60; training beyond ~2 epochs showed mild overfitting (training loss continued to fall while validation loss plateaued and slightly rose). A 2-epoch run, or load_best_model_at_end=True, would likely generalize marginally better.

Evaluation

Evaluated on the 20 held-out documents from the eval split, generating with max_new_tokens=2048:

Metric	Result
Valid JSON (parses with `json.loads`)	20 / 20
Exact object match vs. ground truth	8 / 20

The base Llama-3.2-3B-Instruct model, by contrast, wrapped its output in markdown code fences with explanatory preamble and trailing notes, so its raw output was not directly parseable — the primary improvement from finetuning is reliable, clean, fence-free JSON output.

Exact-object match is a strict all-or-nothing metric (any single differing field, date format, or optional key fails the whole row), so it understates true field-level accuracy.

Framework

Unsloth
TRL (SFTTrainer / SFTConfig)
PEFT (LoRA)
Transformers

License

This model is derived from Llama 3.2 and is subject to the Llama 3.2 Community License. The training dataset is licensed Apache-2.0.

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for Legeng/llama3.2-3b-json-extraction-merged

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Adapter

(415)

this model

Legeng
/

llama3.2-3b-json-extraction-merged