Domain-Specific IE Adapter — Gemma 3 12B (long instruction)

LoRA adapter for google/gemma-3-12b-it fine-tuned to extract compensation-consultant mentions from SEC proxy statements (DEF 14A), classifying each firm as:

RET — consultant retained/engaged as a compensation advisor
SURV — survey-only data provider (not retained as an advisor)

Companion artifact for the anonymous submission "From Lengthy Narrative to Structured Data: Instruction Fine-Tuning Open-Weight LLMs for Information Extraction from Corporate Disclosures."

This adapter


Base model	`google/gemma-3-12b-it`
Method	LoRA (r=8, α=16), 4-bit QLoRA
Instruction format	detailed (long)
Instance-level F1	95.7%

Each adapter is trained for one instruction variant — pair this adapter with the long prompt at inference.

Adapter family (same task, 2,001-sample training set)

Adapter	Base	Instruction	F1
`domain-specific-adapter`	Gemma 3 27B	detailed (long)	95.9%
`domain-specific-adapter-short`	Gemma 3 27B	minimal (short)	96.1%
`domain-specific-12b-adapter`	Gemma 3 12B	detailed (long)	95.7%
`domain-specific-12b-adapter-short`	Gemma 3 12B	minimal (short)	93.0%

Evaluated on 316 consultants across 143 company-years from 84 SEC filings.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "google/gemma-3-12b-it"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto", load_in_4bit=True)
model = PeftModel.from_pretrained(model, "cs-file-uploads/domain-specific-12b-adapter")

See the code repository for the full inference pipeline (retrieval → chunking → extraction → grounding validation → cross-chunk aggregation) and the exact prompt templates.

Output format

{RET: 'Pearl Meyer & Partners, LLC'}, {SURV: 'Mercer', 'Radford'}

Training

2,001 human-labeled and augmented proxy-statement excerpts; LR 2e-4 (cosine, 3% warmup); max sequence length 5,120; 3 epochs; 20% validation split.

License

Derived from Google Gemma 3; use is subject to the Gemma Terms of Use. Adapter weights are released for research use.

Citation

@misc{anonymous2026fromlengthy,
  title={From Lengthy Narrative to Structured Data: Instruction Fine-Tuning Open-Weight LLMs for Information Extraction from Corporate Disclosures},
  author={Anonymous},
  year={2026},
  note={Under review}
}

Downloads last month: 27

Model tree for cs-file-uploads/domain-specific-12b-adapter

Base model

google/gemma-3-12b-pt

Finetuned

google/gemma-3-12b-it

Adapter

(360)

this model