YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Model Card for KKieXX/llama-3.2-1b-finder-lora

Model Details

Model Description

A QLoRA fine-tuned LoRA adapter for Llama-3.2-1B, trained on the FinDER financial question-answering dataset. The adapter teaches the base model to read SEC filing evidence passages and produce grounded, concise answers to financial questions — a domain where the base model struggles with financial abbreviations, disambiguation of related figures, and synthesis across financial statements.

  • Developed by: Linda Lin (qat3207)
  • Model type: Causal language model — LoRA adapter for meta-llama/Llama-3.2-1B
  • Language(s) (NLP): English
  • License: Llama 3.2 Community License (inherited from base model)
  • Finetuned from model: meta-llama/Llama-3.2-1B

Model Sources


Uses

Direct Use

Load the adapter on top of the base meta-llama/Llama-3.2-1B model to answer financial questions grounded in SEC filing evidence. The model expects a prompt that contains an evidence passage and a question, and returns a short factual answer.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B",
    load_in_4bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "KKieXX/llama-3.2-1b-finder-lora")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")

evidence = "The company reported net revenues of $3.2 billion for fiscal year 2023..."
question = "What were the net revenues for fiscal year 2023?"

prompt = f"Evidence: {evidence}\n\nQuestion: {question}\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Note: The base model is gated on HuggingFace Hub — you must accept the Llama 3.2 license and authenticate with huggingface-cli login before downloading.

Downstream Use

This adapter is the generator and evaluator backbone of the FinDER Multi-Agent Financial QA System, a LangGraph pipeline that pairs it with Pinecone RAG over 50k+ indexed SEC filing chunks. It performs two roles in that pipeline:

  1. Generator: Given retrieved evidence + a user question, produce a factual answer.
  2. Evaluator: Assess whether a generated answer is SUFFICIENT or INSUFFICIENT, triggering a retrieval retry if needed.

Out-of-Scope Use

  • General-purpose conversational chat (model is not instruction-tuned for dialogue)
  • Financial advice or investment decisions — outputs are not verified by domain experts
  • Questions without grounding evidence (model is not designed for zero-shot financial QA)
  • Languages other than English

Bias, Risks, and Limitations

  • Domain coverage: Training data is limited to SEC filings (10-K, 10-Q, etc.). Performance on other financial document types (earnings calls, analyst reports) is untested.
  • Small model capacity: At 1B parameters, the model may struggle with complex multi-step reasoning or synthesis across long documents.
  • Hallucination risk: As with all LLMs, the model can produce plausible-sounding but incorrect numerical figures. Always verify outputs against source documents.
  • Training data cutoff: The FinDER dataset derives from SEC filings with a specific temporal range; the model may not generalize to very recent filings with new financial instruments or accounting standards.
  • Quantization artifacts: 4-bit quantization (NF4) introduces minor precision loss versus full-precision inference.

Recommendations

Users (both direct and downstream) should treat model outputs as a first-pass extraction aid, not as authoritative financial analysis. Cross-check all numerical figures against the original SEC source documents.


Training Details

Training Data

FinDER (Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation):

  • 5,703 query–evidence–answer triplets derived from SEC filings
  • Used a 30% stratified sample: 1,710 training examples (split with random_state=123, frac=0.3)
  • Remaining 70% held out as test split (never seen during fine-tuning)
  • Categories span revenue figures, expense line items, ratio calculations, and segment reporting

Citation:

@misc{choi2025finder,
  title={FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation},
  author={Chanyeol Choi and Jihoon Kwon and Jaeseon Ha and Hojun Choi and Chaewoon Kim and Yongjae Lee and Jy-yong Sohn and Alejandro Lopez-Lira},
  year={2025},
  eprint={2504.15800},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2504.15800}
}

Training Procedure

Preprocessing

Raw SEC table artifacts (tab characters, excessive whitespace, newlines) were cleaned from evidence passages before tokenization. Each example was formatted as:

Evidence: {evidence_text}

Question: {question}

Answer: {answer}

Training Hyperparameters

Hyperparameter Value
Training regime 4-bit QLoRA (NF4 quantization + bfloat16 compute)
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Target modules 7 (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
Trainable parameters 11.3M / 1.247B (0.9%)
Epochs 3
Learning rate 2e-4
LR schedule Cosine with warmup
Batch size 4
Gradient accumulation steps 4 (effective batch = 16)
Optimizer paged_adamw_32bit
Max sequence length 512 tokens

Speeds, Sizes, Times

  • Hardware: Google Colab A100 (40 GB)
  • Training time: 619 seconds (10 minutes) for 3 epochs over 1,710 examples
  • Adapter size: ~26 MB (LoRA weights only; base model downloaded separately)

Evaluation

Testing Data

Held-out FinDER test split: 70% of the 5,703 examples (stratified by category, same random_state=123 split). Evaluation samples 200 examples stratified by category.

Factors

Three ablation conditions:

Condition Model Evidence Source
A Base Llama-3.2-1B Gold evidence (oracle)
B Fine-tuned Llama-3.2-1B (this model) Gold evidence (oracle)
C Fine-tuned Llama-3.2-1B (this model) Pinecone RAG retrieval

The A→B gap isolates the fine-tuning effect. The B→C gap isolates retrieval quality.

Metrics

Metric Description
Exact Match (EM) Binary: predicted answer equals gold answer (normalized)
Token-level F1 Overlap between predicted and gold token sets
BERTScore F1 Semantic similarity via distilbert-base-uncased
Retrieval Recall@K Jaccard similarity ≥ 0.25 between retrieved and gold evidence

Results

See notebooks/evaluation.ipynb and eval_results.csv (generated at evaluation time) for per-example scores and aggregate results across all three conditions.


Environmental Impact

Carbon emissions estimated using the Machine Learning Impact calculator.

  • Hardware Type: NVIDIA A100 40GB
  • Hours used: 0.17 hours (619 seconds)
  • Cloud Provider: Google Colab (Google Cloud)
  • Compute Region: US (estimated)
  • Carbon Emitted: < 0.05 kg CO₂eq (estimated)

Technical Specifications

Model Architecture and Objective

  • Architecture: Causal decoder-only transformer (Llama-3.2-1B) with LoRA low-rank weight updates injected into 7 attention and MLP projection layers
  • Objective: Next-token prediction (causal LM) on answer tokens only (loss masked on prompt tokens)
  • Parameters: 1.247B total; 11.3M trainable (LoRA adapter only)

Compute Infrastructure

Detail
Hardware NVIDIA A100 40GB (Google Colab)
Software Python 3.10, PyTorch 2.x, Transformers 4.x, PEFT 0.x, bitsandbytes, TRL
Quantization bitsandbytes NF4 (4-bit) with double quantization

Citation

If you use this adapter, please also cite the FinDER dataset:

BibTeX:

@misc{choi2025finder,
  title={FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation},
  author={Chanyeol Choi and Jihoon Kwon and Jaeseon Ha and Hojun Choi and Chaewoon Kim and Yongjae Lee and Jy-yong Sohn and Alejandro Lopez-Lira},
  year={2025},
  eprint={2504.15800},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2504.15800}
}

Model Card Authors

Linda Lin (qat3207)

Model Card Contact

linzhixianlindax@gmail.com

Downloads last month
17
Safetensors
Model size
1B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for KKieXX/llama-3.2-1b-finder-lora