Cheque Field Regressor

A ResNet-50-based regression model that simultaneously localises six standard fields in Indian bank cheques in a single forward pass.

Field	Description
`date`	Cheque date
`amount`	Amount in figures
`ifsc`	IFSC / branch code
`acno`	Account number
`sign`	Signature region
`name`	Payee name

Read this first — the model is a research baseline, not a strong result. This checkpoint accompanies a paper whose finding is a negative result: on the small IDRBT benchmark, the cut-paste synthetic data this model was trained with gives no measurable improvement over real data alone once seed variance and training compute are controlled, and no learned variant beats a trivial no-learning layout prior on mean IoU (the prior, which just predicts each field's mean training box, already reaches 0.691 mIoU / 80% accuracy). Use the model to reproduce and build on that analysis, not as a production cheque reader.

Model Performance

Evaluated on a held-out test set of 10 real images from the IDRBT Cheque Image Dataset. The released weights are the best of three random seeds (seed 42). Per-field numbers below are for that single checkpoint; the honest aggregate is the seed-averaged row.

Field	IoU	Acc@0.5
date	0.682	100%
amount	0.724	100%
ifsc	0.678	90%
acno	0.715	100%
sign	0.614	100%
name	0.808	100%
Mean (seed 42, this checkpoint)	0.704	98%
Mean (3 seeds, reported finding)	0.648 ± 0.041	89.4 ± 7.5%

MAE on normalised coordinates: 0.0155 (seed 42); 0.0185 ± 0.002 (3 seeds).

Context for these numbers (see the paper for the full controlled comparison):

Model	Training data	mIoU	mAcc@0.5
Static prior (no learning)	none	0.691	80.0%
Real only, 150 ep	real	0.547	66.7%
Real only, 495 ep (compute-matched)	real	0.660	93.3%
This model (real + synthetic, 3 seeds)	real+synth	0.648 ± 0.041	89.4 ± 7.5%

The compute-matched real-only model matches or beats this synthetic-augmented model on every metric — the basis for the paper's negative result.

Architecture

Input [B, 3, 512, 1024]  (ImageNet-normalised)
  │
  ResNet-50 Backbone (ImageNet pretrained)
  ├── layer3 → 1024-ch → Conv1×1(128) → AdaptiveAvgPool(4×2)
  └── layer4 → 2048-ch → Conv1×1(128) → AdaptiveAvgPool(4×2)
  │
  Concatenate + Flatten  →  2048-d spatial feature vector
  │
  FC(2048 → 512) → ReLU → Dropout(0.3)
  FC(512 → 24)   → Sigmoid
  │
Output [B, 6, 4]  normalised [xmin, ymin, xmax, ymax] ∈ (0, 1)

The key architectural choice is multi-scale spatial pooling (4×2 grid) instead of Global Average Pooling. This preserves coarse positional signals (left/right, top/bottom), which are essential for distinguishing cheque fields that differ primarily in their spatial layout (e.g., ifsc top-left vs sign bottom-right).

Total parameters: 24,963,160

Quick Start

import torch
from PIL import Image
import torchvision.transforms.v2 as T
from transformers import AutoConfig, AutoModel

# Load model
config = AutoConfig.from_pretrained(
    "jaganadhg/cheque-field-regressor", trust_remote_code=True
)
model = AutoModel.from_pretrained(
    "jaganadhg/cheque-field-regressor", trust_remote_code=True
)
model.eval()

# Preprocess image
preprocess = T.Compose([
    T.Resize((config.img_height, config.img_width)),
    T.ToImage(),
    T.ToDtype(torch.float32, scale=True),
    T.Normalize(mean=config.image_mean, std=config.image_std),
])

img = Image.open("cheque.tif").convert("RGB")
pixel_values = preprocess(img).unsqueeze(0)   # [1, 3, 512, 1024]

# Predict
with torch.no_grad():
    output = model(pixel_values=pixel_values)

boxes = output.boxes[0]   # [6, 4] normalised coordinates
for name, box in zip(config.field_names, boxes):
    print(f"{name:8s}: {[round(v, 3) for v in box.tolist()]}")

Using the Image Processor

The bundled ChequeImageProcessor handles all preprocessing and can also rescale predicted boxes back to the original image pixel coordinates:

import torch
from PIL import Image
from transformers import AutoConfig, AutoModel
from image_processing_cheque import ChequeImageProcessor

config    = AutoConfig.from_pretrained("jaganadhg/cheque-field-regressor", trust_remote_code=True)
model     = AutoModel.from_pretrained("jaganadhg/cheque-field-regressor", trust_remote_code=True)
processor = ChequeImageProcessor.from_pretrained("jaganadhg/cheque-field-regressor")

img = Image.open("cheque.tif").convert("RGB")
orig_size = img.size                              # (width, height)

inputs = processor(images=img)                    # returns pixel_values [1,3,512,1024]

model.eval()
with torch.no_grad():
    output = model(**inputs)

# Rescale boxes to original pixel coordinates
results = processor.postprocess_boxes(
    output.boxes, target_sizes=[orig_size]
)
print(results[0])
# {
#   'date':   [1680.0, 82.0, 2350.0, 226.0],
#   'amount': [1675.0, 385.0, 2295.0, 550.0],
#   ...
# }

Training

This checkpoint was trained on 85 real training images plus 194 synthetic images from the cheque-synthetic-images dataset (41 synthetic images whose source cheque fell in the real validation/test split were excluded to prevent leakage). Validation and test sets are real images only. Training used a two-phase strategy:

Warm-up (15 epochs): Backbone frozen, only projection + head trained. Prevents the random head from corrupting ImageNet features with large early-training gradients. LR = 1×10⁻⁴.
End-to-end fine-tuning (135 epochs): All layers unfrozen. LR reduced to 1×10⁻⁵ with cosine annealing.

Loss: Field-weighted SmoothL1

weights = {date: 1.0, amount: 1.0, ifsc: 2.0, acno: 1.0, sign: 1.5, name: 1.0}

Augmentation: Random horizontal flip, affine (±5°, scale 0.9–1.1), colour jitter, Gaussian blur.

Optimiser: AdamW, weight decay 1×10⁻⁴, batch size 4, 150 epochs total.

Limitations

It does not beat a trivial baseline on mIoU. A no-learning predictor of the per-field mean training box reaches 0.691 mIoU; this model does not exceed it. Cheque layout is highly regular, so absolute IoU is a weak signal.
Synthetic training gave no measurable benefit. The accompanying paper's controlled comparison (3 seeds + a compute-matched real-only control) found the synthetic data this model uses does not improve localisation. Treat the model as a characterisation tool, not evidence that synthetic data helps.
Tiny evaluation set: 10 test images / 60 boxes, single split. Per-field accuracies move in 10-point steps; seed-to-seed variance is large (e.g. ifsc accuracy swings 90/40/0% across seeds).
Dataset size and domain: Trained on ~100 real images of Indian cheque formats; generalisation to other formats is untested.
Fixed-field assumption: Always predicts exactly 6 boxes; cannot handle cheques where fields are absent or duplicated.

Citation

If you use this model in your research, please cite:

@misc{cheque-field-regressor-2026,
  title        = {Cheque Field Localisation Using Regression-Based ResNet-50},
  author       = {Gopinadhan, Jaganadh},
  year         = {2026},
  howpublished = {HuggingFace Model Hub},
  url          = {https://huggingface.co/jaganadhg/cheque-field-regressor}
}

Dataset

The model was trained on the publicly available IDRBT Cheque Image Dataset, released by the Institute for Development and Research in Banking Technology, Hyderabad, India.

License

Model weights and code: Apache 2.0

The IDRBT Cheque Image Dataset has its own terms of use — please refer to the IDRBT website before using this model commercially.

Downloads last month: 14

Safetensors

Model size

25M params

Tensor type

F32

Evaluation results

Mean IoU (test set, mean of 3 seeds) on IDRBT Cheque Image Dataset
self-reported

0.648
mAcc@0.5 (test set, mean of 3 seeds) on IDRBT Cheque Image Dataset
self-reported

0.894
MAE (normalised coordinates, mean of 3 seeds) on IDRBT Cheque Image Dataset
self-reported

0.018