Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for ResNet-152 Text Detector

This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~140k images, where 50% of them had text and 50% of them had no legible text.

Model Details

How to Get Started with the Model

from PIL import Image
import requests
import torch
from transformers import AutoImageProcessor, AutoModelForImageClassification

model = AutoModelForImageClassification.from_pretrained(
    "miguelcarv/resnet-152-text-detector",
)

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)

url = "http://images.cocodataset.org/train2017/000000044520.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((300,300))

inputs = processor(image, return_tensors="pt").pixel_values

with torch.no_grad():
    outputs = model(inputs)
    
logits_per_image = outputs.logits 
probs = logits_per_image.softmax(dim=1) 
print(probs)
# tensor([[0.1085, 0.8915]])

Training Details

  • Trained for three epochs
  • Resolution: 300x300
  • Learning rate: 5e-5
  • Optimizer: AdamW
  • Batch size: 64
  • Trained with FP32
Downloads last month
28,407
Safetensors
Model size
58.3M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.