Model Card for UnifiedQ-Finance-RAFT
Model Details
Model Description
UnifiedQ-Finance-RAFT is a specialized LoRA adapter for Qwen 2.5 32B Instruct, fine-tuned using the RAFT (Retrieval-Augmented Fine-Tuning) technique.
This model was trained to act as the reasoning engine for a quantitative finance RAG pipeline. It addresses the "distractor problem" in RAG systems by being explicitly trained to distinguish between relevant "oracle" documents and irrelevant "distractor" documents when answering complex options trading and risk management queries.
- Developed by: Rednote (UnifiedQ Project)
- Model type: LoRA Adapter (QLoRA 4-bit) for Causal LM
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: Qwen/Qwen2.5-32B-Instruct
Model Sources
- Repository: [Link to your Hugging Face Repo]
- Paper (Technique): RAFT: Adapting Language Model to Domain Specific RAG
Uses
Direct Use
This model is intended to be used with a RAG system (Retrieval-Augmented Generation). It expects a prompt format that includes retrieved context documents (some relevant, some irrelevant) and a user question. It excels at:
- Evaluation of options trading strategies.
- Quantitative risk management analysis.
- Filtering noise from retrieved financial documents.
Out-of-Scope Use
- General chat without context (it is specialized for document-based reasoning).
- Financial advice (this is a research/development tool, not a financial advisor).
- Usage without 4-bit quantization on consumer hardware (due to the 32B parameter size).
Bias, Risks, and Limitations
The model is fine-tuned on specific financial domain data. It may hallucinate if provided with context documents that contain factually incorrect information (garbage in, garbage out). As a 32B model, it requires significant VRAM to run even with adapters.
Recommendations
Users should verify all financial outputs against standard models or verifiable sources. This model should be used as an assistant to a human trader, not an autonomous agent.
How to Get Started with the Model
Use the code below to get started with the model. Note that you must load the base model in 4-bit to fit on standard GPUs.
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
# 1. Load Base Model (Qwen 2.5 32B)
base_model_id = "Qwen/Qwen2.5-32B-Instruct"
adapter_model_id = "Rednote/Qwen-2.5-32B-RAFT-UnifiedQ" # Replace with your actual HF path
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# 2. Load the RAFT Adapter
model = PeftModel.from_pretrained(base_model, adapter_model_id)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
# 3. Inference Example
prompt = "Context: [Doc 1]... [Doc 2]... \n\n Question: How do I hedge delta risk?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 12