jnmrr's picture
Upload model with metrics - 2024-12-12 15:17
d94ca4a verified
metadata
tags:
  - image-classification
  - document-classification
  - vision
library_name: transformers
pipeline_tag: image-classification
license: mit

Document Classification Model

Overview

This model is trained for document classification using vision transformers (DiT).

Model Details

  • Architecture: Vision Transformer (DiT)
  • Tasks: Document Classification
  • Training Framework: 🤗 Transformers
  • Base Model: microsoft/dit-large
  • Training Dataset Size: 32786

Training Parameters

  • Batch Size: 256
  • Learning Rate: 0.001
  • Number of Epochs: 90
  • Mixed Precision: BF16
  • Gradient Accumulation Steps: 2
  • Weight Decay: 0.01
  • Learning Rate Schedule: cosine_with_restarts
  • Warmup Ratio: 0.1

Training and Evaluation Metrics

Training Metrics

  • Loss: 0.1915
  • Grad Norm: 1.3002
  • Learning Rate: 0.0009
  • Epoch: 26.4186
  • Step: 1704.0000

Evaluation Metrics

  • Loss: 0.9457
  • Accuracy: 0.7757
  • Weighted F1: 0.7689
  • Micro F1: 0.7757
  • Macro F1: 0.7518
  • Weighted Recall: 0.7757
  • Micro Recall: 0.7757
  • Macro Recall: 0.7603
  • Weighted Precision: 0.8023
  • Micro Precision: 0.7757
  • Macro Precision: 0.7941
  • Runtime: 8.4106
  • Samples Per Second: 433.1450
  • Steps Per Second: 3.4480

Usage

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image

# Load model and processor
processor = AutoImageProcessor.from_pretrained("jnmrr/ds3-img-classification")
model = AutoModelForImageClassification.from_pretrained("jnmrr/ds3-img-classification")

# Process an image
image = Image.open("document.png")
inputs = processor(image, return_tensors="pt")

# Make prediction
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(-1).item()