Llama-3.2-3B JSON Extraction

A LoRA finetune of Llama 3.2 3B Instruct that extracts structured data from unstructured documents and returns a single, clean JSON object conforming to a provided JSON schema. The model outputs only the JSON object — no markdown code fences, no preamble, and no trailing commentary — making its output directly parseable with json.loads().

This was trained with Unsloth for fast, memory-efficient QLoRA training on a single Kaggle T4 GPU.

Intended use

Given (1) a JSON schema describing the target structure and (2) a free-text / markdown document, the model returns a JSON object populated from the document. Useful for document parsing tasks such as invoices, medical records, business documents, and similar structured-extraction problems.

Out of scope: very long documents (training filtered examples exceeding the 2,048-token context), inputs from raw OCR (training data was clean text, not OCR output), and schemas substantially different from those seen in the training distribution.

Prompt format

The model was trained with the following system prompt and user-message layout. Use the same format at inference — the model is sensitive to it.

System prompt:

You are a data extraction assistant. Extract information from the document and
return a single JSON object that conforms to the provided JSON schema. Output
ONLY the JSON object — no explanations and no markdown code fences.

User message:

JSON schema:
{schema}

Document:
{document}

Extract the data as JSON.

How to use

Option A — load the LoRA adapters (this repo)

from unsloth import FastLanguageModel
from transformers import TextStreamer

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name     = "Legeng/llama3.2-3b-json-extraction",
    max_seq_length = 2048,
    dtype          = None,
    load_in_4bit   = True,
)
FastLanguageModel.for_inference(model)

system_prompt = (
    "You are a data extraction assistant. Extract information from the document "
    "and return a single JSON object that conforms to the provided JSON schema. "
    "Output ONLY the JSON object — no explanations and no markdown code fences."
)

schema   = '{"type":"object","properties":{"invoice_number":{"type":"string"}}}'
document = "Invoice #INV-2025-0042 ..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user",   "content": f"JSON schema:\n{schema}\n\nDocument:\n{document}\n\nExtract the data as JSON."},
]
inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")

out = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Note: set max_new_tokens high enough (e.g. 2048) for documents with large nested schemas, or long outputs will be truncated and fail to parse.

Option B — merged standalone model

A merged 16-bit version is available at Legeng/llama3.2-3b-json-extraction-merged, which loads as a normal model without needing the base model separately.

Training data

  • Dataset: paraloq/json_data_extraction (Apache-2.0)
  • Total examples: 484 (single train split)
  • Split: 90% train / 10% eval, seed=3407, via train_test_split(test_size=0.1)
  • Filtering: examples whose tokenized length exceeded max_seq_length (2,048) were dropped to avoid truncating target JSON
  • Each example pairs an unstructured document (text) and a JSON schema (schema) with the ground-truth JSON object (item)

Training procedure

Trained with Unsloth + TRL SFTTrainer using QLoRA (4-bit base model + LoRA adapters).

Base model & quantization

Parameter Value
base_model unsloth/Llama-3.2-3B-Instruct
max_seq_length 2048
dtype None (auto → float16 on T4)
load_in_4bit True (QLoRA)

LoRA configuration

Parameter Value
r (rank) 16
lora_alpha 16
lora_dropout 0
bias "none"
target_modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
use_gradient_checkpointing "unsloth"
random_state 3407

Training hyperparameters

Parameter Value
per_device_train_batch_size 2
gradient_accumulation_steps 4
effective batch size 8
num_train_epochs 3
learning_rate 2e-4
optim adamw_8bit
weight_decay 0.01
warmup_steps 5
lr_scheduler_type linear
eval_strategy steps
eval_steps 20
logging_steps 1
dataset_num_proc 1
seed 3407
total optimizer steps 153

Hardware

  • Single NVIDIA T4 (16 GB), Kaggle
  • Training time: ~45 minutes

Training results

Training and validation loss over the run:

Step Training Loss Validation Loss
20 0.0778 0.1001
40 0.0514 0.0945
60 0.0236 0.0923
80 0.0380 0.0950
100 0.1236 0.0945
120 0.0048 0.1004
140 0.0492 0.1057
153 0.0086 0.1036

Validation loss was lowest around step 60; training beyond ~2 epochs showed mild overfitting (training loss continued to fall while validation loss plateaued and slightly rose). A 2-epoch run, or load_best_model_at_end=True, would likely generalize marginally better.

Evaluation

Evaluated on the 20 held-out documents from the eval split, generating with max_new_tokens=2048:

Metric Result
Valid JSON (parses with json.loads) 20 / 20
Exact object match vs. ground truth 8 / 20

The base Llama-3.2-3B-Instruct model, by contrast, wrapped its output in markdown code fences with explanatory preamble and trailing notes, so its raw output was not directly parseable — the primary improvement from finetuning is reliable, clean, fence-free JSON output.

Exact-object match is a strict all-or-nothing metric (any single differing field, date format, or optional key fails the whole row), so it understates true field-level accuracy.

Framework

  • Unsloth
  • TRL (SFTTrainer / SFTConfig)
  • PEFT (LoRA)
  • Transformers

License

This model is derived from Llama 3.2 and is subject to the Llama 3.2 Community License. The training dataset is licensed Apache-2.0.

Downloads last month
-
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Legeng/llama3.2-3b-json-extraction-merged

Adapter
(415)
this model

Dataset used to train Legeng/llama3.2-3b-json-extraction-merged