Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [rahul.14]
Model type: [BERT-based classification model for span and arithmetic query detection in financial documents]
Language(s) (NLP): [English (focused on financial terminology)]
License: [MIT]
Finetuned from model [optional]: [bert-base-uncased]

Uses

This model is designed for classifying user queries on financial documents into two main categories:

Span-based queries: Questions that can be answered by directly extracting a span of text (e.g., “What is the revenue in Q4 2023?”).
Arithmetic-based queries: Questions that require numeric reasoning, aggregation, or calculation based on multiple values in the document (e.g., “What is the profit margin in 2023?” or “Total revenue minus cost of goods sold?”).

Intended Use

Integrated into financial document analysis pipelines for automated question answering systems.
Used by AI agents in multi-agent financial platforms (e.g., PiuFi) to route user queries to the appropriate downstream module: span extractor or calculator.
Helps in improving latency by avoiding unnecessary LLM usage for simple extractive queries.

Not Intended For

General-purpose QA outside financial domain
Non-English financial documents (unless fine-tuned accordingly)
Semantic retrieval or summarization (use retrieval models for that)

Direct Use

This model can be used to classify a user query related to financial documents into one of the following labels:

span – if the question can be answered by directly extracting a portion of text.
arithmetic – if the question requires mathematical reasoning over multiple values.

Out-of-Scope Use

This model is not intended for the following use cases:

❌ General-purpose question classification unrelated to financial data (e.g., questions about history, science, or sports).
❌ Non-English queries, unless the model has been explicitly fine-tuned on those languages.
❌ Document retrieval or passage ranking — this model only classifies the type of query, not where to find the answer.
❌ Answer generation or span extraction — the model does not return actual answers, only the query type.
❌ Multi-label or multi-intent classification — it is trained for a single-label output (either "span" or "arithmetic").
❌ Mathematical computation — the model detects arithmetic intent but does not perform any calculations itself.

Bias, Risks, and Limitations

🔎 Bias
- Domain-specific bias: The model has been fine-tuned primarily on financial queries. It may misclassify general-purpose or domain-agnostic questions due to limited exposure outside the financial context.
- Data bias: If the fine-tuning dataset overrepresents certain financial terms, company names, or phrasing styles, the model may be biased toward those and underperform on others.
⚠️ Risks
- Misclassification risk: A span-type question may be wrongly classified as arithmetic (or vice versa), which could trigger the wrong downstream agent (e.g., calculator instead of extractor), leading to incorrect or failed responses.
- Overreliance on surface patterns: The model may rely on superficial cues (e.g., the presence of numbers or certain verbs like “calculate”) and fail in edge cases requiring deeper reasoning.
- Model confidence not exposed: Without confidence scoring or thresholding, integrating this model blindly could result in silent failures if no fallback logic is implemented.
🚫 Limitations
- Only supports binary classification: "span" or "arithmetic". It does not handle other query types like definition, summary, boolean, or trend-based questions.
- Limited to English-language queries (based on pretraining and fine-tuning assumptions).
- Not interpretable: Like most transformer models, it operates as a black box with no transparent reasoning trace.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "rahul14/span-arithmetic-classification"
model = None
tokenizer = None

def load_model():
    global model, tokenizer
    if model is None or tokenizer is None:
        model = AutoModelForSequenceClassification.from_pretrained(model_name)
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model.eval()
    return model, tokenizer

def predict_query(query):
    # Get the already-loaded model and tokenizer
    model, tokenizer = load_model()
    
    # Tokenize the input text and convert to tensor
    inputs = tokenizer(
        query,
        truncation = True,
        padding = True,
        return_tensors = "pt"
    )
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        prediction = outputs.logits.argmax(dim = -1)
        
    return prediction.item()

def pred_label(pred):
    if pred == 0:
        return "Arithmetic"
    return "Span"

# Example usage
query = "What is the 2019 average defined benefit schemes?"
prediction = predict_query(query)
label = pred_label(prediction)
print(f"Prediction: {prediction}, Label: {label}")

Summary

This BERT-based classification model is fine-tuned to distinguish between span-based and arithmetic-based queries in the financial domain. It plays a crucial role in intelligent financial document analysis systems by routing user questions to the appropriate processing module either a text span extractor or a calculation engine.

With high accuracy and fast inference, it is best suited for financial chatbots, QA agents, and multi-agent systems dealing with structured and unstructured financial documents. The model supports English-language queries and is optimized for enterprise-grade FinTech applications where understanding query intent is critical.

rahul14
/

span-arithmetic-classification