FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
Paper โข 2111.02394 โข Published โข 2
Use with the DocTr.
Sample Input Image
Model Prediction Output (height cropped to save space)
Fine-tuned for Arabic, this model is based on docTR's fast_base model. Utilizing the FAST architecture and achieves good performance on Arabic text detection tasks. It is designed to work with the PARSEQ Arabic recognition model for optimal OCR results.
Results of evaluation on 100 synthetic documents in scene (top, similar to the example picture above) and ideal PDF (bottom) conditions and comparison with other open-source engines and Google Cloud Vision. Postprocessing conducted with qwen3.5:4b local LLM.
>>> from doctr.io import DocumentFile
>>> from doctr.models import ocr_predictor, from_hub
>>> img = DocumentFile.from_images(['<image_path>'])
>>> # Load model from the hub
>>> det_model = from_hub("madskills/doctr-fast_base-arabic")
>>> # Use with Arabic recognition model
>>> reco_model = from_hub("madskills/doctr-parseq-arabic")
>>> # Pass it to the predictor
>>> # If your model is a recognition model:
>>> predictor = ocr_predictor(det_arch=det_model,
>>> reco_arch=reco_model,
>>> pretrained=True
>>> )
>>> # optional if your documents are not well oriented
>>> # custom crop and page orientation predictors
>>> page_orientation_model = from_hub("madskills/doctr-mobilenet_v3_small-page-orientation-arabic")
>>> crop_orientation_model = from_hub("madskills/doctr-mobilenet_v3_small-crop-orientation-arabic")
# set the custom orientation predictors
>>> predictor.crop_orientation_predictor = crop_orientation_predictor(
>>> pretrained=False, arch=crop_orientation_model
>>> )
>>> predictor.page_orientation_predictor = page_orientation_predictor(
>>> pretrained=False, arch=page_orientation_model
>>> )
# set custom post-processing parameters
>>> predictor.det_predictor.model.shrink_ratio = 0.43
>>> predictor.det_predictor.model.postprocessor.box_thresh = 0.95
>>> predictor.det_predictor.model.postprocessor.unclip_ratio = 1.55
>>> # Get your predictions
>>> res = predictor(img)