Data-efficient Image Transformer(DeiT) for Document Classification(DocLayNet)

This model is a fine-tuned Data-efficient Image Transformer(DeiT) for document image classification based on the DocLayNet dataset.

Trained on images of the document categories from DocLayNet dataset where the categories namely(with their indexes) are :

{'financial_reports': 0, 'government_tenders': 1, 'laws_and_regulations': 2, 'manuals': 3, 'patents': 4, 'scientific_articles': 5}

Model description

DeiT(facebook/deit-base-distilled-patch16-224) finetuned on document classification

Training data

DocLayNet-base https://huggingface.co/datasets/pierreguillou/DocLayNet-base

Training procedure

hyperparameters:

{ 'batch_size': 128, 'num_epochs': 20, 'learning_rate': 1e-4, 'weight_decay': 0.1, 'warmup_ratio': 0.1, 'gradient_clip': 0.1, 'dropout_rate': 0.1, 'label_smoothing': 0.1 'optmizer': 'AdamW' }

Evaluation results

Test Loss: 0.8134, Test Acc: 81.56%

Usage

from transformers import pipeline

# Load the model using the image-classification pipeline
pipe = pipeline("image-classification", model="kaixkhazaki/vit_doclaynet_base")

# Test it with an image
result = pipe("path_to_image.jpg")
print(result)
Downloads last month
16
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kaixkhazaki/deit_doclaynet_base

Finetuned
(73)
this model

Dataset used to train kaixkhazaki/deit_doclaynet_base