Cheque Field Regressor

A ResNet-50-based regression model that simultaneously localises six standard fields in Indian bank cheques in a single forward pass.

Field Description
date Cheque date
amount Amount in figures
ifsc IFSC / branch code
acno Account number
sign Signature region
name Payee name

Read this first β€” the model is a research baseline, not a strong result. This checkpoint accompanies a paper whose finding is a negative result: on the small IDRBT benchmark, the cut-paste synthetic data this model was trained with gives no measurable improvement over real data alone once seed variance and training compute are controlled, and no learned variant beats a trivial no-learning layout prior on mean IoU (the prior, which just predicts each field's mean training box, already reaches 0.691 mIoU / 80% accuracy). Use the model to reproduce and build on that analysis, not as a production cheque reader.

Model Performance

Evaluated on a held-out test set of 10 real images from the IDRBT Cheque Image Dataset. The released weights are the best of three random seeds (seed 42). Per-field numbers below are for that single checkpoint; the honest aggregate is the seed-averaged row.

Field IoU Acc@0.5
date 0.682 100%
amount 0.724 100%
ifsc 0.678 90%
acno 0.715 100%
sign 0.614 100%
name 0.808 100%
Mean (seed 42, this checkpoint) 0.704 98%
Mean (3 seeds, reported finding) 0.648 Β± 0.041 89.4 Β± 7.5%

MAE on normalised coordinates: 0.0155 (seed 42); 0.0185 Β± 0.002 (3 seeds).

Context for these numbers (see the paper for the full controlled comparison):

Model Training data mIoU mAcc@0.5
Static prior (no learning) none 0.691 80.0%
Real only, 150 ep real 0.547 66.7%
Real only, 495 ep (compute-matched) real 0.660 93.3%
This model (real + synthetic, 3 seeds) real+synth 0.648 Β± 0.041 89.4 Β± 7.5%

The compute-matched real-only model matches or beats this synthetic-augmented model on every metric β€” the basis for the paper's negative result.

Architecture

Input [B, 3, 512, 1024]  (ImageNet-normalised)
  β”‚
  ResNet-50 Backbone (ImageNet pretrained)
  β”œβ”€β”€ layer3 β†’ 1024-ch β†’ Conv1Γ—1(128) β†’ AdaptiveAvgPool(4Γ—2)
  └── layer4 β†’ 2048-ch β†’ Conv1Γ—1(128) β†’ AdaptiveAvgPool(4Γ—2)
  β”‚
  Concatenate + Flatten  β†’  2048-d spatial feature vector
  β”‚
  FC(2048 β†’ 512) β†’ ReLU β†’ Dropout(0.3)
  FC(512 β†’ 24)   β†’ Sigmoid
  β”‚
Output [B, 6, 4]  normalised [xmin, ymin, xmax, ymax] ∈ (0, 1)

The key architectural choice is multi-scale spatial pooling (4Γ—2 grid) instead of Global Average Pooling. This preserves coarse positional signals (left/right, top/bottom), which are essential for distinguishing cheque fields that differ primarily in their spatial layout (e.g., ifsc top-left vs sign bottom-right).

Total parameters: 24,963,160

Quick Start

import torch
from PIL import Image
import torchvision.transforms.v2 as T
from transformers import AutoConfig, AutoModel

# Load model
config = AutoConfig.from_pretrained(
    "jaganadhg/cheque-field-regressor", trust_remote_code=True
)
model = AutoModel.from_pretrained(
    "jaganadhg/cheque-field-regressor", trust_remote_code=True
)
model.eval()

# Preprocess image
preprocess = T.Compose([
    T.Resize((config.img_height, config.img_width)),
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Normalize(mean=config.image_mean, std=config.image_std),
])

img = Image.open("cheque.tif").convert("RGB")
pixel_values = preprocess(img).unsqueeze(0)   # [1, 3, 512, 1024]

# Predict
with torch.no_grad():
    output = model(pixel_values=pixel_values)

boxes = output.boxes[0]   # [6, 4] normalised coordinates
for name, box in zip(config.field_names, boxes):
    print(f"{name:8s}: {[round(v, 3) for v in box.tolist()]}")

Using the Image Processor

The bundled ChequeImageProcessor handles all preprocessing and can also rescale predicted boxes back to the original image pixel coordinates:

import torch
from PIL import Image
from transformers import AutoConfig, AutoModel
from image_processing_cheque import ChequeImageProcessor

config    = AutoConfig.from_pretrained("jaganadhg/cheque-field-regressor", trust_remote_code=True)
model     = AutoModel.from_pretrained("jaganadhg/cheque-field-regressor", trust_remote_code=True)
processor = ChequeImageProcessor.from_pretrained("jaganadhg/cheque-field-regressor")

img = Image.open("cheque.tif").convert("RGB")
orig_size = img.size                              # (width, height)

inputs = processor(images=img)                    # returns pixel_values [1,3,512,1024]

model.eval()
with torch.no_grad():
    output = model(**inputs)

# Rescale boxes to original pixel coordinates
results = processor.postprocess_boxes(
    output.boxes, target_sizes=[orig_size]
)
print(results[0])
# {
#   'date':   [1680.0, 82.0, 2350.0, 226.0],
#   'amount': [1675.0, 385.0, 2295.0, 550.0],
#   ...
# }

Training

This checkpoint was trained on 85 real training images plus 194 synthetic images from the cheque-synthetic-images dataset (41 synthetic images whose source cheque fell in the real validation/test split were excluded to prevent leakage). Validation and test sets are real images only. Training used a two-phase strategy:

  1. Warm-up (15 epochs): Backbone frozen, only projection + head trained. Prevents the random head from corrupting ImageNet features with large early-training gradients. LR = 1Γ—10⁻⁴.

  2. End-to-end fine-tuning (135 epochs): All layers unfrozen. LR reduced to 1Γ—10⁻⁡ with cosine annealing.

Loss: Field-weighted SmoothL1

weights = {date: 1.0, amount: 1.0, ifsc: 2.0, acno: 1.0, sign: 1.5, name: 1.0}

Augmentation: Random horizontal flip, affine (Β±5Β°, scale 0.9–1.1), colour jitter, Gaussian blur.

Optimiser: AdamW, weight decay 1Γ—10⁻⁴, batch size 4, 150 epochs total.

Limitations

  • It does not beat a trivial baseline on mIoU. A no-learning predictor of the per-field mean training box reaches 0.691 mIoU; this model does not exceed it. Cheque layout is highly regular, so absolute IoU is a weak signal.
  • Synthetic training gave no measurable benefit. The accompanying paper's controlled comparison (3 seeds + a compute-matched real-only control) found the synthetic data this model uses does not improve localisation. Treat the model as a characterisation tool, not evidence that synthetic data helps.
  • Tiny evaluation set: 10 test images / 60 boxes, single split. Per-field accuracies move in 10-point steps; seed-to-seed variance is large (e.g. ifsc accuracy swings 90/40/0% across seeds).
  • Dataset size and domain: Trained on ~100 real images of Indian cheque formats; generalisation to other formats is untested.
  • Fixed-field assumption: Always predicts exactly 6 boxes; cannot handle cheques where fields are absent or duplicated.

Citation

If you use this model in your research, please cite:

@misc{cheque-field-regressor-2026,
  title        = {Cheque Field Localisation Using Regression-Based ResNet-50},
  author       = {Gopinadhan, Jaganadh},
  year         = {2026},
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/jaganadhg/cheque-field-regressor}
}

Dataset

The model was trained on the publicly available IDRBT Cheque Image Dataset, released by the Institute for Development and Research in Banking Technology, Hyderabad, India.

License

Model weights and code: Apache 2.0

The IDRBT Cheque Image Dataset has its own terms of use β€” please refer to the IDRBT website before using this model commercially.

Downloads last month
14
Safetensors
Model size
25M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results

  • Mean IoU (test set, mean of 3 seeds) on IDRBT Cheque Image Dataset
    self-reported
    0.648
  • mAcc@0.5 (test set, mean of 3 seeds) on IDRBT Cheque Image Dataset
    self-reported
    0.894
  • MAE (normalised coordinates, mean of 3 seeds) on IDRBT Cheque Image Dataset
    self-reported
    0.018