Instructions to use Legeng/llama3.2-3b-json-extraction-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Legeng/llama3.2-3b-json-extraction-merged with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use Legeng/llama3.2-3b-json-extraction-merged with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Legeng/llama3.2-3b-json-extraction-merged to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Legeng/llama3.2-3b-json-extraction-merged to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Legeng/llama3.2-3b-json-extraction-merged to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Legeng/llama3.2-3b-json-extraction-merged", max_seq_length=2048, )
Llama-3.2-3B JSON Extraction
A LoRA finetune of Llama 3.2 3B Instruct that extracts structured data from
unstructured documents and returns a single, clean JSON object conforming to a
provided JSON schema. The model outputs only the JSON object — no markdown
code fences, no preamble, and no trailing commentary — making its output directly
parseable with json.loads().
This was trained with Unsloth for fast, memory-efficient QLoRA training on a single Kaggle T4 GPU.
Intended use
Given (1) a JSON schema describing the target structure and (2) a free-text / markdown document, the model returns a JSON object populated from the document. Useful for document parsing tasks such as invoices, medical records, business documents, and similar structured-extraction problems.
Out of scope: very long documents (training filtered examples exceeding the 2,048-token context), inputs from raw OCR (training data was clean text, not OCR output), and schemas substantially different from those seen in the training distribution.
Prompt format
The model was trained with the following system prompt and user-message layout. Use the same format at inference — the model is sensitive to it.
System prompt:
You are a data extraction assistant. Extract information from the document and
return a single JSON object that conforms to the provided JSON schema. Output
ONLY the JSON object — no explanations and no markdown code fences.
User message:
JSON schema:
{schema}
Document:
{document}
Extract the data as JSON.
How to use
Option A — load the LoRA adapters (this repo)
from unsloth import FastLanguageModel
from transformers import TextStreamer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Legeng/llama3.2-3b-json-extraction",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
system_prompt = (
"You are a data extraction assistant. Extract information from the document "
"and return a single JSON object that conforms to the provided JSON schema. "
"Output ONLY the JSON object — no explanations and no markdown code fences."
)
schema = '{"type":"object","properties":{"invoice_number":{"type":"string"}}}'
document = "Invoice #INV-2025-0042 ..."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"JSON schema:\n{schema}\n\nDocument:\n{document}\n\nExtract the data as JSON."},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to("cuda")
out = model.generate(input_ids=inputs, max_new_tokens=2048, use_cache=True)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
Note: set
max_new_tokenshigh enough (e.g. 2048) for documents with large nested schemas, or long outputs will be truncated and fail to parse.
Option B — merged standalone model
A merged 16-bit version is available at
Legeng/llama3.2-3b-json-extraction-merged,
which loads as a normal model without needing the base model separately.
Training data
- Dataset:
paraloq/json_data_extraction(Apache-2.0) - Total examples: 484 (single
trainsplit) - Split: 90% train / 10% eval,
seed=3407, viatrain_test_split(test_size=0.1) - Filtering: examples whose tokenized length exceeded
max_seq_length(2,048) were dropped to avoid truncating target JSON - Each example pairs an unstructured document (
text) and a JSON schema (schema) with the ground-truth JSON object (item)
Training procedure
Trained with Unsloth + TRL SFTTrainer using QLoRA (4-bit base model + LoRA adapters).
Base model & quantization
| Parameter | Value |
|---|---|
base_model |
unsloth/Llama-3.2-3B-Instruct |
max_seq_length |
2048 |
dtype |
None (auto → float16 on T4) |
load_in_4bit |
True (QLoRA) |
LoRA configuration
| Parameter | Value |
|---|---|
r (rank) |
16 |
lora_alpha |
16 |
lora_dropout |
0 |
bias |
"none" |
target_modules |
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
use_gradient_checkpointing |
"unsloth" |
random_state |
3407 |
Training hyperparameters
| Parameter | Value |
|---|---|
per_device_train_batch_size |
2 |
gradient_accumulation_steps |
4 |
| effective batch size | 8 |
num_train_epochs |
3 |
learning_rate |
2e-4 |
optim |
adamw_8bit |
weight_decay |
0.01 |
warmup_steps |
5 |
lr_scheduler_type |
linear |
eval_strategy |
steps |
eval_steps |
20 |
logging_steps |
1 |
dataset_num_proc |
1 |
seed |
3407 |
| total optimizer steps | 153 |
Hardware
- Single NVIDIA T4 (16 GB), Kaggle
- Training time: ~45 minutes
Training results
Training and validation loss over the run:
| Step | Training Loss | Validation Loss |
|---|---|---|
| 20 | 0.0778 | 0.1001 |
| 40 | 0.0514 | 0.0945 |
| 60 | 0.0236 | 0.0923 |
| 80 | 0.0380 | 0.0950 |
| 100 | 0.1236 | 0.0945 |
| 120 | 0.0048 | 0.1004 |
| 140 | 0.0492 | 0.1057 |
| 153 | 0.0086 | 0.1036 |
Validation loss was lowest around step 60; training beyond ~2 epochs showed mild
overfitting (training loss continued to fall while validation loss plateaued and
slightly rose). A 2-epoch run, or load_best_model_at_end=True, would likely
generalize marginally better.
Evaluation
Evaluated on the 20 held-out documents from the eval split, generating with
max_new_tokens=2048:
| Metric | Result |
|---|---|
Valid JSON (parses with json.loads) |
20 / 20 |
| Exact object match vs. ground truth | 8 / 20 |
The base Llama-3.2-3B-Instruct model, by contrast, wrapped its output in
markdown code fences with explanatory preamble and trailing notes, so its raw
output was not directly parseable — the primary improvement from finetuning
is reliable, clean, fence-free JSON output.
Exact-object match is a strict all-or-nothing metric (any single differing field, date format, or optional key fails the whole row), so it understates true field-level accuracy.
Framework
- Unsloth
- TRL (
SFTTrainer/SFTConfig) - PEFT (LoRA)
- Transformers
License
This model is derived from Llama 3.2 and is subject to the Llama 3.2 Community License. The training dataset is licensed Apache-2.0.
- Downloads last month
- -
Model tree for Legeng/llama3.2-3b-json-extraction-merged
Base model
meta-llama/Llama-3.2-3B-Instruct