Model Card for UnifiedQ-Finance-RAFT

Model Details

Model Description

UnifiedQ-Finance-RAFT is a specialized LoRA adapter for Qwen 2.5 32B Instruct, fine-tuned using the RAFT (Retrieval-Augmented Fine-Tuning) technique.

This model was trained to act as the reasoning engine for a quantitative finance RAG pipeline. It addresses the "distractor problem" in RAG systems by being explicitly trained to distinguish between relevant "oracle" documents and irrelevant "distractor" documents when answering complex options trading and risk management queries.

  • Developed by: Rednote (UnifiedQ Project)
  • Model type: LoRA Adapter (QLoRA 4-bit) for Causal LM
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: Qwen/Qwen2.5-32B-Instruct

Model Sources

Uses

Direct Use

This model is intended to be used with a RAG system (Retrieval-Augmented Generation). It expects a prompt format that includes retrieved context documents (some relevant, some irrelevant) and a user question. It excels at:

  • Evaluation of options trading strategies.
  • Quantitative risk management analysis.
  • Filtering noise from retrieved financial documents.

Out-of-Scope Use

  • General chat without context (it is specialized for document-based reasoning).
  • Financial advice (this is a research/development tool, not a financial advisor).
  • Usage without 4-bit quantization on consumer hardware (due to the 32B parameter size).

Bias, Risks, and Limitations

The model is fine-tuned on specific financial domain data. It may hallucinate if provided with context documents that contain factually incorrect information (garbage in, garbage out). As a 32B model, it requires significant VRAM to run even with adapters.

Recommendations

Users should verify all financial outputs against standard models or verifiable sources. This model should be used as an assistant to a human trader, not an autonomous agent.

How to Get Started with the Model

Use the code below to get started with the model. Note that you must load the base model in 4-bit to fit on standard GPUs.

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# 1. Load Base Model (Qwen 2.5 32B)
base_model_id = "Qwen/Qwen2.5-32B-Instruct"
adapter_model_id = "Rednote/Qwen-2.5-32B-RAFT-UnifiedQ" # Replace with your actual HF path

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# 2. Load the RAFT Adapter
model = PeftModel.from_pretrained(base_model, adapter_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# 3. Inference Example
prompt = "Context: [Doc 1]... [Doc 2]... \n\n Question: How do I hedge delta risk?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Saravanankannan/Qwen-2.5-32B-RAFT-Finance-v1

Base model

Qwen/Qwen2.5-32B
Adapter
(71)
this model

Space using Saravanankannan/Qwen-2.5-32B-RAFT-Finance-v1 1

Paper for Saravanankannan/Qwen-2.5-32B-RAFT-Finance-v1