ara-extract-v1 — LoRA adapter for Qwen2.5-0.5B-Instruct

LoRA adapter trained on trilingual (Uzbek · Russian · English) government-document extraction and classification tasks. Designed to teach a small base model to emit structured JSON for contract fields, document type, language ID, named entities, and one-sentence summaries.

This adapter is the small-model half of the ARA fine-tuning pair. The QLoRA-on-7B counterpart lives at bilalsaidumarov/ara-extract-7b-qlora.

TL;DR

Metric (vs. base Qwen2.5-0.5B-Instruct, no adapter) Base + LoRA Δ
JSON validity (output parses as JSON when JSON was the target) 0.0% 83.3% +83.3 pp
Char similarity (difflib.SequenceMatcher mean) 0.062 0.374 ×6.0
Exact match 0.0% 10.5% +10.5 pp

The headline number is JSON validity 0 → 83%: the adapter taught the base model the structured output format, which is the practical reason to fine-tune a 0.5B model for this workload at all. Exact match is held back by the held-out tail being heavy on language-ID examples — see Limitations below.

Intended use

Direct use — inside the ARA document-intelligence platform as the LLM backend for structured extraction over Uzbek / Russian / English government documents (contracts, invoices, memos, reports, letters). The 0.5B variant is the fast / cheap path for low-VRAM environments and CPU-only demos.

Prompt format (matches training):

### Instruction:
{instruction}

### Input:
{input}

### Response:

How to use

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

BASE = "Qwen/Qwen2.5-0.5B-Instruct"
ADAPTER = "bilalsaidumarov/ara-extract-v1"

tokenizer = AutoTokenizer.from_pretrained(BASE)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
base = AutoModelForCausalLM.from_pretrained(BASE)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

prompt = (
    "### Instruction:\nExtract amount, signing date, and counterparty from the "
    "contract excerpt. Return a JSON object with keys amount, date, counterparty.\n\n"
    "### Input:\nAGREEMENT â„–ARA-2026-014 dated 14.03.2026 between Ministry of "
    "Economy and Finance and Acme Logistics LLC. Total contract value: 1,200,000 UZS.\n\n"
    "### Response:\n"
)
enc = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    out = model.generate(**enc, max_new_tokens=128, do_sample=False,
                         pad_token_id=tokenizer.pad_token_id)
print(tokenizer.decode(out[0, enc["input_ids"].shape[1]:], skip_special_tokens=True))

Serving with vLLM

vllm serve Qwen/Qwen2.5-0.5B-Instruct \
    --enable-lora \
    --lora-modules ara-extract-v1=bilalsaidumarov/ara-extract-v1 \
    --max-loras 4

Training

Base model Qwen/Qwen2.5-0.5B-Instruct (Apache-2.0)
Adapter type LoRA (PEFT)
Target modules q_proj, k_proj, v_proj, o_proj
Rank r / alpha / dropout 16 / 32 / 0.05
Optimizer AdamW, lr 2e-4
Precision fp16
Epochs 3
Train / eval split 76 / 19 (deterministic, holdout-frac 0.2)
Hardware NVIDIA RTX 4060 8 GB
Wall time ~20 s
Loss curve 2.30 → 1.91 → 1.45
Adapter size on disk ~10 MB

Dataset

95 supervised examples across 15 task families, trilingual (uz · ru · en):

Family Count
Contract field extraction (amount, date, counterparty → JSON) 25
Document classification (contract / invoice / memo / report / letter) 13
Deadline date extraction 12
Summarization (one-sentence) 12
Language identification 10
Named-entity extraction 10
Contract clause translation 7
Monetary amount listing 6

Format — one JSON object per line: {"instruction": ..., "input": ..., "output": ...}.

Evaluation

Held-out 20% (19 examples), greedy decoding, max_new_tokens=128.

Metric Base 0.5B + LoRA
exact_match 0.0% (0/19) 10.5% (2/19)
char_similarity 0.062 0.374
json_valid (on 6 JSON-target rows) 0.0% (0/6) 83.3% (5/6)

Same eval harness on the 7B QLoRA sibling:

Metric LoRA on 0.5B QLoRA on 7B
exact_match 10.5% 26.3%
char_similarity 0.374 0.610
json_valid 83.3% 100.0%
Adapter size ~10 MB ~40 MB
Train wall time (RTX 4060) 20 s 5 min

Limitations

  • Small dataset (95 examples). Enough to demonstrate the technique and shift the JSON-validity rate decisively, not enough for production quality across all 15 task families. Language ID and translation in particular need more examples.
  • Exact-match is the weakest metric because the held-out tail is heavy on language-ID rows, which the 0.5B base struggles with after only 3 epochs.
  • Multilingual coverage is uneven. Uzbek has the least training data of the three; outputs in Uzbek are noisier than in Russian or English.
  • No hyperparameter sweep. r=16, alpha=32, lr=2e-4, 3 epochs are defaults from the LoRA paper — not tuned.
  • Inherits base-model risks (bias, hallucination) — use temperature ≤ 0.2 and validate JSON before downstream use.

License

Apache-2.0 (matches base model Qwen/Qwen2.5-0.5B-Instruct).

Citation

If this adapter is useful in your work, cite the ARA project:

@misc{ara2026,
  title  = {ARA — AI Resource Assistant (document-intelligence platform)},
  author = {Bilol Saidumarov},
  year   = {2026},
  url    = {https://github.com/sb-bilal-dev-2/ara}
}

Framework versions

  • PEFT 0.19.1
  • Transformers ≥ 4.46
  • PyTorch ≥ 2.5
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bilalsaidumarov/ara-extract-v1

Adapter
(593)
this model

Evaluation results