YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Card for KKieXX/llama-3.2-1b-finder-lora
Model Details
Model Description
A QLoRA fine-tuned LoRA adapter for Llama-3.2-1B, trained on the FinDER financial question-answering dataset. The adapter teaches the base model to read SEC filing evidence passages and produce grounded, concise answers to financial questions — a domain where the base model struggles with financial abbreviations, disambiguation of related figures, and synthesis across financial statements.
- Developed by: Linda Lin (qat3207)
- Model type: Causal language model — LoRA adapter for
meta-llama/Llama-3.2-1B - Language(s) (NLP): English
- License: Llama 3.2 Community License (inherited from base model)
- Finetuned from model: meta-llama/Llama-3.2-1B
Model Sources
- Repository: KKieXX/llama-3.2-1b-finder-lora
- Training notebook:
notebooks/fine_tuning_for_FinDER_dataset.ipynb(QLoRA pipeline, runs on Colab A100)
Uses
Direct Use
Load the adapter on top of the base meta-llama/Llama-3.2-1B model to answer financial questions grounded in SEC filing evidence. The model expects a prompt that contains an evidence passage and a question, and returns a short factual answer.
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B",
load_in_4bit=True,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "KKieXX/llama-3.2-1b-finder-lora")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
evidence = "The company reported net revenues of $3.2 billion for fiscal year 2023..."
question = "What were the net revenues for fiscal year 2023?"
prompt = f"Evidence: {evidence}\n\nQuestion: {question}\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Note: The base model is gated on HuggingFace Hub — you must accept the Llama 3.2 license and authenticate with
huggingface-cli loginbefore downloading.
Downstream Use
This adapter is the generator and evaluator backbone of the FinDER Multi-Agent Financial QA System, a LangGraph pipeline that pairs it with Pinecone RAG over 50k+ indexed SEC filing chunks. It performs two roles in that pipeline:
- Generator: Given retrieved evidence + a user question, produce a factual answer.
- Evaluator: Assess whether a generated answer is SUFFICIENT or INSUFFICIENT, triggering a retrieval retry if needed.
Out-of-Scope Use
- General-purpose conversational chat (model is not instruction-tuned for dialogue)
- Financial advice or investment decisions — outputs are not verified by domain experts
- Questions without grounding evidence (model is not designed for zero-shot financial QA)
- Languages other than English
Bias, Risks, and Limitations
- Domain coverage: Training data is limited to SEC filings (10-K, 10-Q, etc.). Performance on other financial document types (earnings calls, analyst reports) is untested.
- Small model capacity: At 1B parameters, the model may struggle with complex multi-step reasoning or synthesis across long documents.
- Hallucination risk: As with all LLMs, the model can produce plausible-sounding but incorrect numerical figures. Always verify outputs against source documents.
- Training data cutoff: The FinDER dataset derives from SEC filings with a specific temporal range; the model may not generalize to very recent filings with new financial instruments or accounting standards.
- Quantization artifacts: 4-bit quantization (NF4) introduces minor precision loss versus full-precision inference.
Recommendations
Users (both direct and downstream) should treat model outputs as a first-pass extraction aid, not as authoritative financial analysis. Cross-check all numerical figures against the original SEC source documents.
Training Details
Training Data
FinDER (Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation):
- 5,703 query–evidence–answer triplets derived from SEC filings
- Used a 30% stratified sample: 1,710 training examples (split with
random_state=123, frac=0.3) - Remaining 70% held out as test split (never seen during fine-tuning)
- Categories span revenue figures, expense line items, ratio calculations, and segment reporting
Citation:
@misc{choi2025finder,
title={FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation},
author={Chanyeol Choi and Jihoon Kwon and Jaeseon Ha and Hojun Choi and Chaewoon Kim and Yongjae Lee and Jy-yong Sohn and Alejandro Lopez-Lira},
year={2025},
eprint={2504.15800},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2504.15800}
}
Training Procedure
Preprocessing
Raw SEC table artifacts (tab characters, excessive whitespace, newlines) were cleaned from evidence passages before tokenization. Each example was formatted as:
Evidence: {evidence_text}
Question: {question}
Answer: {answer}
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Training regime | 4-bit QLoRA (NF4 quantization + bfloat16 compute) |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | 7 (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) |
| Trainable parameters | 11.3M / 1.247B (0.9%) |
| Epochs | 3 |
| Learning rate | 2e-4 |
| LR schedule | Cosine with warmup |
| Batch size | 4 |
| Gradient accumulation steps | 4 (effective batch = 16) |
| Optimizer | paged_adamw_32bit |
| Max sequence length | 512 tokens |
Speeds, Sizes, Times
- Hardware: Google Colab A100 (40 GB)
- Training time:
619 seconds (10 minutes) for 3 epochs over 1,710 examples - Adapter size: ~26 MB (LoRA weights only; base model downloaded separately)
Evaluation
Testing Data
Held-out FinDER test split: 70% of the 5,703 examples (stratified by category, same random_state=123 split). Evaluation samples 200 examples stratified by category.
Factors
Three ablation conditions:
| Condition | Model | Evidence Source |
|---|---|---|
| A | Base Llama-3.2-1B | Gold evidence (oracle) |
| B | Fine-tuned Llama-3.2-1B (this model) | Gold evidence (oracle) |
| C | Fine-tuned Llama-3.2-1B (this model) | Pinecone RAG retrieval |
The A→B gap isolates the fine-tuning effect. The B→C gap isolates retrieval quality.
Metrics
| Metric | Description |
|---|---|
| Exact Match (EM) | Binary: predicted answer equals gold answer (normalized) |
| Token-level F1 | Overlap between predicted and gold token sets |
| BERTScore F1 | Semantic similarity via distilbert-base-uncased |
| Retrieval Recall@K | Jaccard similarity ≥ 0.25 between retrieved and gold evidence |
Results
See notebooks/evaluation.ipynb and eval_results.csv (generated at evaluation time) for per-example scores and aggregate results across all three conditions.
Environmental Impact
Carbon emissions estimated using the Machine Learning Impact calculator.
- Hardware Type: NVIDIA A100 40GB
- Hours used:
0.17 hours (619 seconds) - Cloud Provider: Google Colab (Google Cloud)
- Compute Region: US (estimated)
- Carbon Emitted: < 0.05 kg CO₂eq (estimated)
Technical Specifications
Model Architecture and Objective
- Architecture: Causal decoder-only transformer (Llama-3.2-1B) with LoRA low-rank weight updates injected into 7 attention and MLP projection layers
- Objective: Next-token prediction (causal LM) on answer tokens only (loss masked on prompt tokens)
- Parameters: 1.247B total; 11.3M trainable (LoRA adapter only)
Compute Infrastructure
| Detail | |
|---|---|
| Hardware | NVIDIA A100 40GB (Google Colab) |
| Software | Python 3.10, PyTorch 2.x, Transformers 4.x, PEFT 0.x, bitsandbytes, TRL |
| Quantization | bitsandbytes NF4 (4-bit) with double quantization |
Citation
If you use this adapter, please also cite the FinDER dataset:
BibTeX:
@misc{choi2025finder,
title={FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation},
author={Chanyeol Choi and Jihoon Kwon and Jaeseon Ha and Hojun Choi and Chaewoon Kim and Yongjae Lee and Jy-yong Sohn and Alejandro Lopez-Lira},
year={2025},
eprint={2504.15800},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2504.15800}
}
Model Card Authors
Linda Lin (qat3207)
Model Card Contact
- Downloads last month
- 17