Instructions to use jaganadhg/cheque-field-regressor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jaganadhg/cheque-field-regressor with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-to-image", model="jaganadhg/cheque-field-regressor")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("jaganadhg/cheque-field-regressor", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Cheque Field Regressor
A ResNet-50-based regression model that simultaneously localises six standard fields in Indian bank cheques in a single forward pass.
| Field | Description |
|---|---|
date |
Cheque date |
amount |
Amount in figures |
ifsc |
IFSC / branch code |
acno |
Account number |
sign |
Signature region |
name |
Payee name |
Read this first β the model is a research baseline, not a strong result. This checkpoint accompanies a paper whose finding is a negative result: on the small IDRBT benchmark, the cut-paste synthetic data this model was trained with gives no measurable improvement over real data alone once seed variance and training compute are controlled, and no learned variant beats a trivial no-learning layout prior on mean IoU (the prior, which just predicts each field's mean training box, already reaches 0.691 mIoU / 80% accuracy). Use the model to reproduce and build on that analysis, not as a production cheque reader.
Model Performance
Evaluated on a held-out test set of 10 real images from the IDRBT Cheque Image Dataset. The released weights are the best of three random seeds (seed 42). Per-field numbers below are for that single checkpoint; the honest aggregate is the seed-averaged row.
| Field | IoU | Acc@0.5 |
|---|---|---|
| date | 0.682 | 100% |
| amount | 0.724 | 100% |
| ifsc | 0.678 | 90% |
| acno | 0.715 | 100% |
| sign | 0.614 | 100% |
| name | 0.808 | 100% |
| Mean (seed 42, this checkpoint) | 0.704 | 98% |
| Mean (3 seeds, reported finding) | 0.648 Β± 0.041 | 89.4 Β± 7.5% |
MAE on normalised coordinates: 0.0155 (seed 42); 0.0185 Β± 0.002 (3 seeds).
Context for these numbers (see the paper for the full controlled comparison):
| Model | Training data | mIoU | mAcc@0.5 |
|---|---|---|---|
| Static prior (no learning) | none | 0.691 | 80.0% |
| Real only, 150 ep | real | 0.547 | 66.7% |
| Real only, 495 ep (compute-matched) | real | 0.660 | 93.3% |
| This model (real + synthetic, 3 seeds) | real+synth | 0.648 Β± 0.041 | 89.4 Β± 7.5% |
The compute-matched real-only model matches or beats this synthetic-augmented model on every metric β the basis for the paper's negative result.
Architecture
Input [B, 3, 512, 1024] (ImageNet-normalised)
β
ResNet-50 Backbone (ImageNet pretrained)
βββ layer3 β 1024-ch β Conv1Γ1(128) β AdaptiveAvgPool(4Γ2)
βββ layer4 β 2048-ch β Conv1Γ1(128) β AdaptiveAvgPool(4Γ2)
β
Concatenate + Flatten β 2048-d spatial feature vector
β
FC(2048 β 512) β ReLU β Dropout(0.3)
FC(512 β 24) β Sigmoid
β
Output [B, 6, 4] normalised [xmin, ymin, xmax, ymax] β (0, 1)
The key architectural choice is multi-scale spatial pooling (4Γ2 grid)
instead of Global Average Pooling. This preserves coarse positional signals
(left/right, top/bottom), which are essential for distinguishing cheque fields
that differ primarily in their spatial layout (e.g., ifsc top-left vs
sign bottom-right).
Total parameters: 24,963,160
Quick Start
import torch
from PIL import Image
import torchvision.transforms.v2 as T
from transformers import AutoConfig, AutoModel
# Load model
config = AutoConfig.from_pretrained(
"jaganadhg/cheque-field-regressor", trust_remote_code=True
)
model = AutoModel.from_pretrained(
"jaganadhg/cheque-field-regressor", trust_remote_code=True
)
model.eval()
# Preprocess image
preprocess = T.Compose([
T.Resize((config.img_height, config.img_width)),
T.ToImage(),
T.ToDtype(torch.float32, scale=True),
T.Normalize(mean=config.image_mean, std=config.image_std),
])
img = Image.open("cheque.tif").convert("RGB")
pixel_values = preprocess(img).unsqueeze(0) # [1, 3, 512, 1024]
# Predict
with torch.no_grad():
output = model(pixel_values=pixel_values)
boxes = output.boxes[0] # [6, 4] normalised coordinates
for name, box in zip(config.field_names, boxes):
print(f"{name:8s}: {[round(v, 3) for v in box.tolist()]}")
Using the Image Processor
The bundled ChequeImageProcessor handles all preprocessing and can also
rescale predicted boxes back to the original image pixel coordinates:
import torch
from PIL import Image
from transformers import AutoConfig, AutoModel
from image_processing_cheque import ChequeImageProcessor
config = AutoConfig.from_pretrained("jaganadhg/cheque-field-regressor", trust_remote_code=True)
model = AutoModel.from_pretrained("jaganadhg/cheque-field-regressor", trust_remote_code=True)
processor = ChequeImageProcessor.from_pretrained("jaganadhg/cheque-field-regressor")
img = Image.open("cheque.tif").convert("RGB")
orig_size = img.size # (width, height)
inputs = processor(images=img) # returns pixel_values [1,3,512,1024]
model.eval()
with torch.no_grad():
output = model(**inputs)
# Rescale boxes to original pixel coordinates
results = processor.postprocess_boxes(
output.boxes, target_sizes=[orig_size]
)
print(results[0])
# {
# 'date': [1680.0, 82.0, 2350.0, 226.0],
# 'amount': [1675.0, 385.0, 2295.0, 550.0],
# ...
# }
Training
This checkpoint was trained on 85 real training images plus 194 synthetic images from the cheque-synthetic-images dataset (41 synthetic images whose source cheque fell in the real validation/test split were excluded to prevent leakage). Validation and test sets are real images only. Training used a two-phase strategy:
Warm-up (15 epochs): Backbone frozen, only projection + head trained. Prevents the random head from corrupting ImageNet features with large early-training gradients. LR = 1Γ10β»β΄.
End-to-end fine-tuning (135 epochs): All layers unfrozen. LR reduced to 1Γ10β»β΅ with cosine annealing.
Loss: Field-weighted SmoothL1
weights = {date: 1.0, amount: 1.0, ifsc: 2.0, acno: 1.0, sign: 1.5, name: 1.0}
Augmentation: Random horizontal flip, affine (Β±5Β°, scale 0.9β1.1), colour jitter, Gaussian blur.
Optimiser: AdamW, weight decay 1Γ10β»β΄, batch size 4, 150 epochs total.
Limitations
- It does not beat a trivial baseline on mIoU. A no-learning predictor of the per-field mean training box reaches 0.691 mIoU; this model does not exceed it. Cheque layout is highly regular, so absolute IoU is a weak signal.
- Synthetic training gave no measurable benefit. The accompanying paper's controlled comparison (3 seeds + a compute-matched real-only control) found the synthetic data this model uses does not improve localisation. Treat the model as a characterisation tool, not evidence that synthetic data helps.
- Tiny evaluation set: 10 test images / 60 boxes, single split. Per-field
accuracies move in 10-point steps; seed-to-seed variance is large (e.g.
ifscaccuracy swings 90/40/0% across seeds). - Dataset size and domain: Trained on ~100 real images of Indian cheque formats; generalisation to other formats is untested.
- Fixed-field assumption: Always predicts exactly 6 boxes; cannot handle cheques where fields are absent or duplicated.
Citation
If you use this model in your research, please cite:
@misc{cheque-field-regressor-2026,
title = {Cheque Field Localisation Using Regression-Based ResNet-50},
author = {Gopinadhan, Jaganadh},
year = {2026},
howpublished = {HuggingFace Model Hub},
url = {https://huggingface.co/jaganadhg/cheque-field-regressor}
}
Dataset
The model was trained on the publicly available IDRBT Cheque Image Dataset, released by the Institute for Development and Research in Banking Technology, Hyderabad, India.
License
Model weights and code: Apache 2.0
The IDRBT Cheque Image Dataset has its own terms of use β please refer to the IDRBT website before using this model commercially.
- Downloads last month
- 14
Evaluation results
- Mean IoU (test set, mean of 3 seeds) on IDRBT Cheque Image Datasetself-reported0.648
- mAcc@0.5 (test set, mean of 3 seeds) on IDRBT Cheque Image Datasetself-reported0.894
- MAE (normalised coordinates, mean of 3 seeds) on IDRBT Cheque Image Datasetself-reported0.018